# Is a bayesian MCMC regression approach suitable/recommended for this complex model?

#### Idriel

##### New Member
Hi everyone. I'm trying to reanalyze the data of an experiment I did using a bayesian approach, but the complexity of the problem has me stuck. I read Richard McElreath's "Statistical Rethinking" (2nd ed.) for useful insight and I think I'm conceptually 80% of the way there, but I could use some help.

Background:
Two molecules, A and B, interact. We can probe the stability of interaction by pulling them apart until the interaction breaks, and record the rupture force (F) and the lifetime of interaction [t(F)]. The simplest models of interaction assume a single energy barrier between the bound (A-B) and unbound (A B) states. These single-barrier models are summarized by the Dudko-Hummer-Szabo (DHS) equation (picture below).

Truly a behemoth. t0 is the lifetime at zero force. x is the distance to the peak ("dagger" symbol) of the energy barrier, and delta G (just G for convenience) is the energy difference from the low energy A-B state to the peak of the barrier. All three of those must be positive. v is the shape factor (the bane of my existence). Different values of v describe different shapes (models) of the barrier. v can be either 1/2, 2/3 or 1 (maybe other values, but that is harder to interpret).

The problem:
Molecule A has a mutant, A', and we have reason to believe that the interaction A'-B is less stable than A-B, so we wish to compare parameters (t0, x and G) between the two. I've tried to turn this problem into a GLM, which is easy for v=1, but not for the other values (I think). My two main problems are that I don't know if I should treat v as a parameter of regression or treat each v as an independent model to be compared (via WAIC for example), and that I don't know how to assign a prior to v, or how to write the likelihood distribution for t(F) in the case I consider v to be a parameter (Dirichlet dist. maybe?).

Summary of variables:
*Data
-t(F)_i
-F_i
-categorical variable "mutant" (data from either A or A')

*Parameters (for A or A')
-t0_i > 0
-x_i > 0
-G_i > 0
-v_i? = 1/2, 2/3 or 1

*Relevant questions (goals of the analysis)
-Which value of v gives the best fit for A and A'? Is it the same or different for both?
-How do the physical parameters t0, x and G (if v is different from 1) compare between A and A'?

I'm thinking about using Rstan to tackle the coding, like in the book (I think Python+PyMC3 is a good, but harder to use alternative). My data so far is a little bit limited (n=6 and 9 points of t(F) vs F for A and A' respectively) so I'm not hoping for an excellent fit. I'm expecting to return to the lab soon and gather more data points. Any help is greatly appreciated. Please note that I'm not an expert on mathematical notation, but I'll do the effort to understand any and all help posted here. Thanks in advance for your time!

Last edited:

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Well all of the quantum like physics in here is likely to scare people off. Unfortunately, that means you end up with me. Bayesian modeling definitely seems like it could handle this, but the issue may be with your ability to identify the correct model. I don't think I can be much help there. Perhaps posting your dataframe and working code may help inspire some additional assistance.

As for priors, flat priors are always the cautious approach, but if you know anything about targeted dist you should try to incorporate that info (e.g., values have to be positive or say bounded between 0-1, etc.).

I have only partially used Stan or PyMC3. Perhaps once you better define the model you could reach directly out to Richard - he seems like an open and nice person. Also, if you are looking to generalize results to another, slightly different, sample; there is something called transportability or calibration that may be need - but I could have misinterpreted that part of your post.

Also, welcome to the forum! P.S., Those look like double daggers,