# Is there a way to simulate skewed data with only a mean and standard error?

#### sunny3333

##### New Member
I have a mean and its corresponding standard deviation from a skewed distribution. The original data, which I don't have, is on a likert scale from 1 to 7. And the mean I have is between 2 and 3.
Now I need to simulate the data based on only these two parameters. If I simulate a set of data assuming normal distribution I'll get negetive values which don't make sense. I suppose I can assume my data to be log-nomally distributed, or maybe even negative binomial or something? But I don't know how to translate my two parameters.

Does anyone know a way to use only mean and std to simulate a skewed distribution?

btw, I don't need my simulated data to be intergers as if from a likert scale, continuous ones are fine.

Last edited:

#### BGM

##### TS Contributor
You need to provide more information / assumptions for your modelling choice. Otherwise it would be completely arbitrary.

The reason is straight-forward: you have a 7-categories multinomial trial which involves 7 proportion parameters (1 for each category).

Now you only know that they sum to 1, and the mean and standard deviation after transforming to a 1-7 scale. Therefore you only have 3 equations to solve for 7 unknowns - which is an under-determined systems with infinitely many solution.

Once you has chosen a distribution for it (i.e. specifying all proportions, by calculations after imposing extra model constraint etc.), it will be very simple. You can partition the [0,1] interval into 7 regions corresponding to the categories (the size of the region equals to the corresponding proportion parameter). Then you generate an Uniform(0,1) random number; when it falls into the region, you returns the corresponding categories. This is just the well known Inverse transform sampling:

http://en.wikipedia.org/wiki/Inverse_transform_sampling

In programming languages you can arrange order for the region entering the conditional construct in descending order of their size (i.e the mode come first). It can help you to improve the efficiency as it will terminate the construct earlier more often.

#### sunny3333

##### New Member
Hi BGM. Thanks for your quick response.
I really can't make any assumption about p1-p7 because I have no data to support them. But I think I can get away with assuming the data is log-normally distributed. Even though the original data was from a likert scale, I don't need my simulated data to be integers. Continuous scale is fine, as long as they have a skewed distribution with the required mean and standard deviation. Do you think there's a way I can simulate the data with this assumption?

#### BGM

##### TS Contributor
Ok lets focus on generating a log-normal distribution, with the given mean $E[X]$ and variance $Var[X]$ .

After obtaining the parameters $\mu, \sigma^2$, you can generate a normal random variable $Y \sim \mathcal{N}(\mu, \sigma^2)$ and then take $X = e^Y$ to be your required log-normal random variable.