Without the sample size I think you're out of luck for what you want to do. At least if I'm understanding you correctly.
Hi all!
this is probably a simple question; however, my statistics skills got a bit rusty and I cannot find an appropriate solution on the internet...
This is the problem: Let X_1, ..., X_n be a series of values drawn from a normal distribution. All I know about them is their mean u=sum X_i/n and their standard deviation s (n is unknown and can be assumed to be large). I want to compute the likelihood (i.e. a p-value) that X_1, ..., X_n come from the normal distribution N(U,S^2), with U and S known.
I need something like Student's or Welch's t-tests; however, those tests (1) require n to be known and (2) test the hypothesis that two populations have equal means (instead I want to test for both equal mean and equal standard deviation). Problem (1) could probably be solved with the assumption that n is large, so the t distribution tends to N(0,1)...
Can someone help me with this? thank you very much!
Without the sample size I think you're out of luck for what you want to do. At least if I'm understanding you correctly.
I don't have emotions and sometimes that makes me very sad.
I think there should be a solution (it's just that I cannot find it): consider the simpler version of this problem where I just want to take into account the mean u of my data (and just ignore its standard deviation). Then, this can be solved with a Student's t-test. Since I can assume large n, I simply can approximate the t-distribution of the test with N(0,1)...
The problem is: is there someting like Student's t-test that takes into account both sample mean and standard deviation?
Look into normality tests....i.e. Shapiro-Wilk test
nicola (06-17-2015)
nicola (06-17-2015)
yes, you are totally right woa ok, I'll see if in some way I can derive n from the data I have.. thank you!
What data do you have?
I don't have emotions and sometimes that makes me very sad.
Hi! in the end, I managed to obtain the sample size n! this makes everything much easier.
To summarize, this is an example instance of the problem I wish to solve: I know that mean(X_1,...,X_n)=21, stdev(X_1,...,X_n)=3, and n=250. How to compute the likelihood that X_1, ..., X_n have been generated from the distribution N(21.5,4)?
I could perform a Student's or Welch's t-test, but those tests only give me the likelihood that the means are equal, right? Is there a way to compute the likelihood that both mean and standard deviation are the same?
thanks!
what is the purpose of this endeavor? Does it have to be compared to (21.5, 4) or can you just test whether your data is normally distributed?
In the prior posts, you may have been able to insert a range of n-values and say that your parameters would be normally distributed given n-value = ? - ?.
Currently you can also plot your data, if you actually have them, and overlay a normal distribution with mean 21.5 and SD 4, and visually examine the distributions.
Stop cowardice, ban guns!
It must be compared to N(21.5,4). I already know the data is normally distributed, so this is not of concern. I could use the test , but this would only include in the computation, and not the standard deviation of the sample (instead I want to use also the standard deviation to make the estimate more accurate)
Unfortunately, I need an automatic method to perform this task (I cannot use graphical methods) because I am implementing this as a C++ routine to be called hundreds of times per second... this problem comes from the analysis of DNA sequencing data.
If you gave N(21.4,4) the same sample size as the comparison sample, and you confirmed normality assumptions for the ttest, then you can put the other pieces together and do a ttest. Also, if you just gave N(21.4,4) the same sample size you could run a K-S test to compare distributions.
Stop cowardice, ban guns!
I solved the problem. I post the solution here in the hope it will be useful to others.
Again, the problem formulation is:
compute the likelihod of observing sample mean and sample standard deviation in samples drawn from the distribution
The quantity we are interested in is (I use log-likelihood since using log simplifies notation). Since sample mean and sample variance of a normally distributed population are two independent random variables, we have that
The random variable is t-distributed with degrees of freedom. For large , Student's t-distribution tends to ; we assume big n so we can approximate the distribution of with . Then (applying the definition of the standard normal distribution's density function),
The random variable is chi-distributed with degrees of freedom. Again, we assume to be large. Then, the distribution of the random variable tends to and we have:
note that (n is large), so the above quantity simplifies to
Putting it all together, we finally obtain
Tweet |