Hi, I will glad to read your thoughts
Whats sample size is large enough so I can assume normality based on the Central Limit Theorem? Some write 30 some write 100.
I run some simulations and it seems that for some skewed data (finite variance, independent ...) you need a sample bigger than 200.
And what is reasonably symmetrical ???
I run simulations from F(8,8) to F(19,19) with a large number of repeats (100,000) and checked what sample size brings the average close to the Normal distribution.
What is "close to the Normal distribution"? I thought about two options:
1. Sample distribution's Skewness<0.5 and Sample distribution's excess Kurtosis<0.5
2. SW test - since limited to 5000, an average of each 5000 size blocks. Definitely too powerful for a large sample, so maybe with low p-value 0.01 or 0.001.
Currently, I used option1, I may also try option 2
And I run the following regression:
DV=sample size
IVs: Population's parameters: Skewness, Excess Kurtosis, Skewness*Kurtosis, Excess Kurtosis^2, Skewness^2
A potential problem you know the sample statistics (Skewness, Kurtosis) while the regression is base on the true statistics.
Your thoughts? Any recommended article?
Whats sample size is large enough so I can assume normality based on the Central Limit Theorem? Some write 30 some write 100.
I run some simulations and it seems that for some skewed data (finite variance, independent ...) you need a sample bigger than 200.
And what is reasonably symmetrical ???
I run simulations from F(8,8) to F(19,19) with a large number of repeats (100,000) and checked what sample size brings the average close to the Normal distribution.
What is "close to the Normal distribution"? I thought about two options:
1. Sample distribution's Skewness<0.5 and Sample distribution's excess Kurtosis<0.5
2. SW test - since limited to 5000, an average of each 5000 size blocks. Definitely too powerful for a large sample, so maybe with low p-value 0.01 or 0.001.
Currently, I used option1, I may also try option 2
And I run the following regression:
DV=sample size
IVs: Population's parameters: Skewness, Excess Kurtosis, Skewness*Kurtosis, Excess Kurtosis^2, Skewness^2
A potential problem you know the sample statistics (Skewness, Kurtosis) while the regression is base on the true statistics.
Your thoughts? Any recommended article?
Last edited: