how do you know if a relationship really exists?

#1
So i had a problem where I was supposed to calculate a correlation coefficient for some data using the definitional formula and then interpret the results and all of that, but then it asked me to say whether this relationship really exists in the population.
Should I use normal distribution, or confidence intervals or something of that sort?

I'd really appreciate some help. Thanks.
 

JohnM

TS Contributor
#2
You can't say it with 100% certainty unless you sample the entire population.

There are tests you can run to establish the statistical significance of the correlation coefficient and/or the model coefficients (i.e., the slope of the regression line), so yes, you could compute confidence intervals around r or the betas.
 
#3
John,

I want to follow up on this question as i think i can learn something valuable here.

In general do we not use the pearson correlation to determine if there is a relation between the two variables? For example if i find that r=.85 then i am determining that 85% of the change in y is attributable tot he change in x, so i infer that there is some relationship between the variables. Though this does not tell me wether the relationship has any causality, simply that when one variable changes the other does as well. So i am reading into your answer that this inference is aonly valid for the sample data and that further work is needed to make a claim that the relation exsists within the population as a whole. is that right?

"statistical significance" of the corelation coefficient is escaping me this early morning, could you comment further on that?

thanks
jerry
 

JohnM

TS Contributor
#4
Jerry,

r is the correlation coefficient and is merely an index of the strength of the linear relationship between two variables

r^2 is the coefficient of determination and is the proportion of variance in y that is explained by the variance in x

"statistical significance" of the correlation:
- if you draw multiple samples from the same population and compute r for each sample, you'll get some variability, hence you can compute a confidence interval around the "average" r and also state hypotheses like:

Ho: r = 0
Ha: r > 0

So the significance of r is a statement to the effect that it is significantly larger than 0, or in other words, there is some correlation between x and y

Determining whether some relationship exists in the entire population is the same with any scientific study or experiment - your study design and underlying theory need to be convincing enough and provide sufficient statistical evidence to support it, or at least reject the notion that it doesn't exist (i.e., rejecting Ho).