+ Reply to Thread
Results 1 to 3 of 3

Thread: generating random data for a scatter plot.

  1. #1
    Points: 11, Level: 1
    Level completed: 21%, Points required for next Level: 39

    Thanked 0 Times in 0 Posts

    Smile generating random data for a scatter plot.

    Hi there,

    My stats knowledge is pretty limited so I would appreciate any help available on the following question.

    I want to generate a scatter plot with 231 random data points, where the correlation coefficient is 0.64 between the two variables. Is this a possible to solve problem?

    Alternatively I need to two sets of data - both 231 data points each, one with a mean of 1.8 with a S.D of 0.6 and the other with a mean of 6.8 with an S. D of 1.0. these two sets of data need to correlate with a coefficient of 0.64. Is this sort of query possible? and how would I go about generating that random data?

    I have access to Excel, Minitab and SPSS if necessary.

    Any help would be so gratefully received!!!

  2. #2
    Omega Contributor
    Points: 38,374, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Not Ames, IA
    Thanked 1,186 Times in 1,147 Posts

    Re: generating random data for a scatter plot.

    This is a very specific request, can I ask what these data are going to be used for?

    Do you not care the distributions? If not, I would imagine this is achievable.
    Stop cowardice, ban guns!

  3. #3
    TS Contributor
    Points: 14,811, Level: 78
    Level completed: 91%, Points required for next Level: 39
    Miner's Avatar
    Greater Milwaukee area
    Thanked 405 Times in 363 Posts

    Re: generating random data for a scatter plot.

    The random data is fairly easy. In Minitab, select Calc > Random data > Normal... (or other distribution) > Number of rows to generate: 231; Mean: 1.8, Standard Deviation: 0.6. In Excel, use =NORM.INV(RAND(), 1.8, 0.6). Copy formula down 231 rows.

    The tricky part is to generate the correlation. I would start with a simple equation that relates the two sets of data. For a simple example, take y = 2x. In Minitab, right click an empty column and select Formulas > Assign Formula to Column > enter 2*C1 (or column containing the x data). This will generate the second data set. Unfortunately, this will have a correlation coefficient = 1.0, so we need to modify the formula to add error (i.e., y = 2x + e). Create another column of random data with a mean of 0 and an arbitrary standard deviation. Create another column and assign a formula to add the x and e columns. Run the correlation between y and this new column. It will now be less than 1. Manually adjust the standard deviation of the e column until you get the desired correlation of 0.64. You can probably do this easier in Excel using Goal Seek or Solver, but I will leave that to you.

  4. The Following User Says Thank You to Miner For This Useful Post:

    hlsmith (03-16-2017)

+ Reply to Thread


Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Advertise on Talk Stats