Working with a distribution of outcomes from discrete events

#1
Hello,

I have a data series that result from a long (2,950) series of discrete events. Roughly half of the events ended with a value of exactly -1, the other half are continously distributed between 1 and positive infinity. There are no values between 1 and -1. A histogram would look something like this:

Code:
9
8  x
7  x
6  x
5  x
4  x   x
2  x   x x
1  x   x x x x
  -1 0 1 2 3 4 5 6 7
I have a fair amount of statistics background, but I am a loss to analyize this distribution, specifically to come up with an expected value and confidence interval.

It was suggested to me to partition the data set into groups of 50 random data points and sum the values within each partition. That would (should?) result in a normal distribution, so that I can apply typical descriptive statistics and come up with an expected value (mean of the sums) and a confidence interval based on the distribution of the sums.

My issue is that I can't come up with a defensible reason why this will work. I don't remember anything in statistics about summing (or average, I suppose, since the sample sizes are equal) a subset of a non-normal distribution in order to create a normal distribution. I'd need to know whether 50 is the right number, whether to group the data points into groups of 50, or just randomly selecting 50 points from the series over and over and how many times I should do that.

Any help would be greatly appreciated. Am I approaching this the wrong way? Am I using the wrong language to describe what's happening here?

Much oblidged,
John
 
#2
Sorry, I mistated that a bit.

I stated above that the range was -1 or a range from 1 to positive infinity and there were not values between -1 and 1. Corrected, each event has two outcomes, -1 (failure) or a range between 0 and positive infinity (success). There are no values between -1 and 0.

Still looking for help in determining the expected value from a series of these events.

Thanks,
John
 

Link

Ninja say what!?!
#3
Off the top of my head, there are two possible ways you can analyze this,

A) Treat the analysis as a binary outcome: either failure or success.

B) Look at onl the successes and create model to determine the magnitude of success.

Another though I have is to change the "-1" to "0" and to analyze it that way. You could then do a Poisson regression.
 
#4
Off the top of my head, there are two possible ways you can analyze this,

A) Treat the analysis as a binary outcome: either failure or success.

B) Look at onl the successes and create model to determine the magnitude of success.

Another though I have is to change the "-1" to "0" and to analyze it that way. You could then do a Poisson regression.
Hi Link,

Thanks for the response. Can you expand a bit more on your second point?

I purposefully stayed away from treating the analysis as binary, because the magnitude of the success is important. Also, I was looking at the Poisson distribution at one point, but Wikipedia scared me away with this:

Another common problem with Poisson regression is excess zeros: if there are two processes at work, one determining whether there are zero events or any events, and a Poisson process determining how many events there are, there will be more zeros than a Poisson regression would predict. An example would be the distribution of cigarettes smoked in an hour by members of a group where some individuals are non-smokers. - http://en.wikipedia.org/wiki/Poisson_regression
I would have a lot of zeros. :)

Thanks for your help.
John