# Scatterplot relationship issue.

#### KimboSlice

Hi, the data below is just a sample of the data I am trying to enter into a scatterplot.

The issue is that a previous correlation test shows a p value of 0.011, a clear relationship.

However, when I took the averages of 196 entries of the data below, the scatterplot shows little relationship?

How can I format the data to show the linear relationship between the two variables? Thanks.

Traffic Speed (MPH) CO Level (R mgm-3)
44.3 0.6
45.6 0.6
49.9 0.5
44.5 0.5
46.8 0.6
46.6 0.7

#### Dason

With 196 observations you can have a weak correlation and still get a fairly statistically significant estimate. This isn't too surprising.

#### KimboSlice

Scatterplot shows little correlation?

Please take a look at the attached scatterplot.

Basically, Carbon Monoxide and Traffic Speed was measure 24 hours per day.

My sample covers 7 days in total making 196 values (hence why I averaged the daily measurements)

When I put the averages into a scatterplot, you can see there is no clear relationship.

However, when I tested for correlation, I got a P-Value = 0.011 which shows there is a strong relationship.

What am I doing wrong with the scatterplot which isn't showing a relationship as strong as it should be?

#### Dason

The p-value does not comment on the strength of the relationship. It tells you how much evidence there is against the idea that the null hypothesis (correlation = 0) is true. So it just tells you that you have reason to believe that the correlation isn't exactly 0. It's entirely possible that the true correlation is 0.001 and you could get a significant p-value.

Your scatterplot is fine. Also this should have just been in your other thread - there was no real reason to make a new thread as far as I can see so I merged it in with the other one.

#### Dason

I only see ten points on your scatterplot. Are there just that many duplicate values?

#### KimboSlice

I calculated the averages... So 24 reading in one day, took the average and did it the 7 days so it covers all the data.

So is it worth putting that scatterplot in my work, how would I describe the relationship in comparison to the P-Value which was very strong?

If there are any other statistical tests you recommend for these variables, please tell me!

Thanks.

#### Dason

Maybe we should back up. Can you start at the beginning and describe how the data was collected and exactly what you did?

#### KimboSlice

Yeah sure,

The data is from a government statistics website based on air quality and the factors which affect air quality.

You have to look into a particular pollutant and analyse the factors which influence its concentration at a particular site.

The samples are huge, we're talking 24 readings a day every day since 1997 so I chose to look at a small sample ranging over a week so 24 readings a day, 7 days...

I chose to do Carbon Monoxide (CO) as my first pollutant and traffic speed as the factor which influences concentration.

I put the data into minitab alongside one another and used a correlation test and got P-Value = 0.011

Then attempted to re-enforce the relationship with a scatter-plot by taking the averages of 24 readings a day for 7 days and plotted them with the averages of traffic.

However, when the graph shows, there is little evidence of a strong relationship unlike the correlation test?

#### Dason

I don't have time to do a full response but I'm going to tell you that you really shouldn't be interpreting your correlation test. That test assumes the observations are independent. If the data is really collected over time then the observations are definitely not independent.

#### lukasm

I'm by far no expert in statistics... but does the traffic speed fluctuate a lot over a day? Because if the traffic speed fluctuates more in a day than the daily averages (eg. due to traffic jams during rush hour), and if the CO-level is correlated, then the CO-level might also fluctuate more over a day than the daily CO averages. Then you might lose a lot of the correlation by averaging over a day...
Just a possibility

Also, why does you plot show 10 datapoint is your plotting daily averages of 7 days?