Is there a way to simulate skewed data with only a mean and standard error?

#1
I have a mean and its corresponding standard deviation from a skewed distribution. The original data, which I don't have, is on a likert scale from 1 to 7. And the mean I have is between 2 and 3.
Now I need to simulate the data based on only these two parameters. If I simulate a set of data assuming normal distribution I'll get negetive values which don't make sense. I suppose I can assume my data to be log-nomally distributed, or maybe even negative binomial or something? But I don't know how to translate my two parameters.

Does anyone know a way to use only mean and std to simulate a skewed distribution?
Please advise. Your help will be greatly appreciated.

btw, I don't need my simulated data to be intergers as if from a likert scale, continuous ones are fine.

Thanks for your help.
 
Last edited:

BGM

TS Contributor
#2
You need to provide more information / assumptions for your modelling choice. Otherwise it would be completely arbitrary.

The reason is straight-forward: you have a 7-categories multinomial trial which involves 7 proportion parameters (1 for each category).

Now you only know that they sum to 1, and the mean and standard deviation after transforming to a 1-7 scale. Therefore you only have 3 equations to solve for 7 unknowns - which is an under-determined systems with infinitely many solution.

Once you has chosen a distribution for it (i.e. specifying all proportions, by calculations after imposing extra model constraint etc.), it will be very simple. You can partition the [0,1] interval into 7 regions corresponding to the categories (the size of the region equals to the corresponding proportion parameter). Then you generate an Uniform(0,1) random number; when it falls into the region, you returns the corresponding categories. This is just the well known Inverse transform sampling:

http://en.wikipedia.org/wiki/Inverse_transform_sampling

In programming languages you can arrange order for the region entering the conditional construct in descending order of their size (i.e the mode come first). It can help you to improve the efficiency as it will terminate the construct earlier more often.
 
#3
Hi BGM. Thanks for your quick response.
I really can't make any assumption about p1-p7 because I have no data to support them. But I think I can get away with assuming the data is log-normally distributed. Even though the original data was from a likert scale, I don't need my simulated data to be integers. Continuous scale is fine, as long as they have a skewed distribution with the required mean and standard deviation. Do you think there's a way I can simulate the data with this assumption?
 

BGM

TS Contributor
#4
Ok lets focus on generating a log-normal distribution, with the given mean [math] E[X] [/math] and variance [math] Var[X] [/math] .

Assume you follow the follow parametrization:

http://en.wikipedia.org/wiki/Log-normal_distribution#Arithmetic_moments

Since log-normal distribution is a 2-parameter distribution, and you can see you can express the parameter in terms of the given moment in the link above.

After obtaining the parameters [math] \mu, \sigma^2 [/math], you can generate a normal random variable [math] Y \sim \mathcal{N}(\mu, \sigma^2) [/math] and then take [math] X = e^Y [/math] to be your required log-normal random variable.