Fitting a model with Bernoulli/Binomially distributed data

#1
I am used to fitting data to estimate parameters in a model f(x,A) where x is my data point and A is the parameter(s) to be determined. To achieve this I usually use chi-squared fitting, so I am minimizing
[TEX]\chi^2 = \sum \frac{(y_i - f(x_i,A))^2}{\sigma_i^2}[/TEX].

I have recently been given an almost identical task except that my data is now succeed/fail as opposed to a continuous value. So my data is now a 1 or a 0 and my model predicts the probability of success p(x,A). I know that this is a Bernoulli trial and I expected to be able to use the maximum likelihood method without too much difficulty. I thought I had done but I have a question. I have determined the following expression for the Bernoulli maximum likelihood which I think is correct
[TEX] L(p,y) = \Pi^{n}_{i} p^{y_i}(1-p)^{1-y_i}[/TEX].

The problem that I have is that I don't have lots of trials to find the probability. I have a single trial at a number of different data points. What I ideally want is an expression of the form
[TEX] L(p....p_n,x....x_n,y....y_n) = \Pi^{n}_{i} (p(x_i ,A))^{y_i}(1-(p(x_i ,A)))^{1-y_i}[/TEX]
so that my probability also depends on the data point not just the outcome. Is this also a valid expression? If so is there an easy way to turn the product into a sum? It seems harder in the second equation.

Thanks.
 
Last edited:

Dason

Ambassador to the humans
#2
Have you looked into logistic regression? You could consider different link functions but the idea is the same.
 
#3
I have looked at logistic regression and it may be the solution to me problem although I have never use it before. I was wondering whether the above approach was still valid or whether the Gaussian case was unique as the sum of n normal measurements follows a chi-squared distribution.
 

Dason

Ambassador to the humans
#4
I don't understand why you're talking about Gaussians.

Turning products into sums is relatively easy when we're talking about likelihoods. Since usually all we care about is maximizing our likelihood we can consider any monotone increasing function of the likelihood and maximize that instead and the parameter estimates are still the same. Consider what happens when you look at the log of the likelihood.
 
#5
I think my question is simply; does the maximum likelihood method still work if my likelihood function is the product of N measurements with different probabilities, where the probabilities are predicted by my model? Or, does it only work when I take repeated measurements at the same unknown probability? As the likelihood function is the joint PDF of all my data I've probably just answered my own question.
 

Dason

Ambassador to the humans
#6
It depends on how you expect those probabilities are related. If you don't want to impose any sort of structure on how the probability of success is related to the covariate then you wouldn't be able to do a good job of estimating the probability of success when you only have a single observation at a certain level of the covariate. Logistic regression is just one way to impose some sort of structure between the covariate and the probability of success.
 
#7
I have a quite detailed model of the probability of success as a function of my input variables and the parameters I am trying to determine. I actually have quite an interesting situation where I have a few input variables to which the probability is quite sensitive. My data points are very close together in the input space which I why I am not binning my data. I don't know a lot about logistic regression but if the clue is in the name it might be a bit tricky to incorporate which is why I might prefer to use MLE. Also if I use a separate model its defeating the point of testing this one. Though its been an unexpectedly interesting weekend learning about fitting to data from discrete distributions.