Cope with distributions that may take negative numbers

#1
Hi to everyone,

I would like to know how to cope with distributions that may take negative numbers when it is physically impossible. For instance, the lifetime distribution of an equipment, which obviously cannot be negative, but sometimes could give negative numbers. What can I do to avoid that?

Thanks in advance.

David
 

Dason

Ambassador to the humans
#4
Maybe if you provide a concrete example and what you think is causing an issue it would be easier to address the issue. But if you're modeling data that must be positive then maybe it would be a good idea to just use a distribution that only has support on the positive numbers?
 
#5
Let's think that the lifetime of an equipment is normally distributed with a mean of 5 months and a standard deviation of 1,5 month. Sometimes it can take negative numbers, that's impossible. Could I do something to avoid this situation? Or what should I do with this data?
 

Dason

Ambassador to the humans
#6
Then it's not actually normally distributed. But if the normal provides a suitable approximation for that case then I don't really see the issue. Sure it's possible for a N(5, 1.5^2) to go negative but it's a very low probability. Once again you haven't really said *why* this is an issue for you. Are you just bothered by the possibility that a normal random variable can be negative and your variable has to be positive?
 
Last edited:
#7
You could use a gamma distribution or lognormal distribution. Then it can not take any negative values. And for larger means it will kind of look like a normal distribution. It will look kind of symmetric, although strictly it is not.
 

hlsmith

Omega Contributor
#8
This seems more like a data quality issue. Are you getting impossible data values? If so, you should check your protocols and the functions related to data generation.


Otherwise refer to GG and Dason's posts if you are just theorizing.
 

Dason

Ambassador to the humans
#10
You still haven't actually told us what the actual concern is. Are you generating random numbers from the distribution and if you get a negative then everything blows up? Are you just worried because theoretically an normal can take a negative so using any method/model that assumes a normal distribution for the (conditional) response allows for the possibility of a negative result and you know that can't happen (and if so why does what the model theoretically allow matter so much to you if it only allows it with an almost non-existent probability)?

The usual concern would be if you were using a method that *requires* the response be positive but in your data every now and then you might have a negative (or zeros) either due to impure data or something else weird going on. That is a different issue entirely. But I think that isn't what you're worried about and you're jumping the gun on being concerned over the theory that the model theoretically allows for negative values even though you know they shouldn't/can't happen.
 
#11
The usual concern would be if you were using a method that *requires* the response be positive but in your data every now and then you might have a negative (or zeros) either due to impure data or something else weird going on.

That's my problem.
 

Dason

Ambassador to the humans
#12
Maybe if you provide a concrete example
So like are you using a poisson regression and sometimes you have negative counts or something? Please provide some actual details on what you're doing. I've been trying to get actual details because to be blunt it's still not completely 100% clear to me what your issue is because the way things have been worded makes it a little ambiguous. If you provide a concrete example that will clear up any and all issues and then we can move on from there.
 
#13
the lifetime of an equipment is normally distributed with a mean of 5 months and a standard deviation of 1,5 month.

Can I truncate the function just to avoid that negative numbers?
 

hlsmith

Omega Contributor
#14
Well I am back again, you need to figure out why you have negative numbers and prevent them from happening. You could almost treat them like missing data and see if they are MAR (Missing (effed-up) at random) or MCAR, so you can drop them.


What type of analyses are you running, because negative won't prevent you from using standard normal, etc. At least for many distributions. You would just want to know why they are effed-up and if left in dataset are they leverage/outliers messing up your model.


P.S., I support Dason in that your example is too crude.
 

Dason

Ambassador to the humans
#15
Your example isn't an example that causes any sort of problem. You haven't actually said what the problem is. As far as I can see you are misunderstanding almost everything I've said. The method/model here would be the normal distribution (which allows positive and negative) and the data is positive only so it is NOT the situation that I described here
The usual concern would be if you were using a method that *requires* the response be positive but in your data every now and then you might have a negative (or zeros) either due to impure data or something else weird going on.
That kind of situation would be if you're using a regression where you're regressing log(lifetime) and for some reason you have negative lifetimes (which shouldn't happen) actually in your data.

So if that's literally what you're doing then you're doing a terrible job explaining it. I'm not trying to be insulting - I just need you to understand that you need to put a little bit more effort into explaining the situation and what the ACTUAL problem is.
 

hlsmith

Omega Contributor
#16
I was also thinking about transformations of data getting botched, but that is not really a parametric issue, like you are implying with your example/wording.