# How to determine whether two datasets of two related variables are different?

#### REM

##### New Member
Hi,
I have two datasets each with regular measurements of wind speed and noise level. I know that noise level increases as wind speed increases, and that there is considerable variation at all wind speeds. The datasets may be different sizes and have different proportions of wind speeds.
I dont need to know that the wind speeds are different between the data sets, but do need a way of determining whether the noise /wind speed relationship is different.

It has been suggested that multplying wind speed x noise might help but I think this may only show that one dataset has more higher windspeeds than the other.

Any help, or suggestions of what to seach for will be gratefully received!
Rod

#### ledzep

##### Point Mass at Zero
Hi,
... but do need a way of determining whether the noise /wind speed relationship is different.
Rod
Can you tell us a bit more on what type of measurement is wind speed (and noise levels)? It it a continuous measurement or does it have different levels (low, mid, high)?

You can compare the graphs between the two data sets which will give you a good idea of the relationship between wind speed and noise. Assuming windspeed and noise are continous, scatterplot will tell you the relationship (linear, parabola, cubic??).

The sort of analysis you can use depends on the type of measurements you have.

#### bugman

##### Super Moderator
My guess is that Wind in in mph or kmph and wind is in decibels. Scatterplots as ledzep suggested would be a good idea. i would assume that the relationships between the varaibles would change though, with wind direction.

#### REM

##### New Member
The measurements are continuous as you suggest with wind measured in m/s and noise in decibels. The relationship between the two is approximately linear, or may be better described with a second order polynomial.
I can visually compare the datasets on a scatter plot but I need a way of determining whether any difference is statistically significant.
Best regards,
Rod

#### ledzep

##### Point Mass at Zero
The measurements are continuous as you suggest with wind measured in m/s and noise in decibels. The relationship between the two is approximately linear, or may be better described with a second order polynomial.
I am assuming noise is your response and wind speed is your independent variable.

To statistically test if the relationship is linear, you can fit a linear regression by regressing your response against your indpendent variable. Then asseses the significance of the slope parameter. Your Null hypothesis is that your slope parameter is zero i.e. no relationship between windspeed and noise. Since you are saying that scatterplot shows a linearity, your slope coefficient should be statistically significant. You can look at the adjusted R-squared value to see how much variation in your response has been explained by the model (as a rule of thumb, anything >80% indicates a good fitting model).

Then you can examine your residuals (by using residual plots) to see if a polynomial term is needed or not. If the residuals are random scatters with no discernable pattern then it suggests polynomial term is not required. However, if the residuals show that curvature/parabola then it hints at the need to fit polynomial term.

HTH

#### REM

##### New Member
Thanks again. I have a trendline for each dataset and the regression coefficient indicates how well this line describes the dataset. What is needed now is a method that allows comparison between two similar datasets each of this type. Any ideas?
Rod