# Scatterplot relationship issue.

#### KimboSlice

Hi, the data below is just a sample of the data I am trying to enter into a scatterplot.

The issue is that a previous correlation test shows a p value of 0.011, a clear relationship.

However, when I took the averages of 196 entries of the data below, the scatterplot shows little relationship?

How can I format the data to show the linear relationship between the two variables? Thanks.

Traffic Speed (MPH) CO Level (R mgm-3)
44.3 0.6
45.6 0.6
49.9 0.5
44.5 0.5
46.8 0.6
46.6 0.7

#### Dason

With 196 observations you can have a weak correlation and still get a fairly statistically significant estimate. This isn't too surprising.

#### KimboSlice

Scatterplot shows little correlation?

Please take a look at the attached scatterplot.

Basically, Carbon Monoxide and Traffic Speed was measure 24 hours per day.

My sample covers 7 days in total making 196 values (hence why I averaged the daily measurements)

When I put the averages into a scatterplot, you can see there is no clear relationship.

However, when I tested for correlation, I got a P-Value = 0.011 which shows there is a strong relationship.

What am I doing wrong with the scatterplot which isn't showing a relationship as strong as it should be?

#### Dason

The p-value does not comment on the strength of the relationship. It tells you how much evidence there is against the idea that the null hypothesis (correlation = 0) is true. So it just tells you that you have reason to believe that the correlation isn't exactly 0. It's entirely possible that the true correlation is 0.001 and you could get a significant p-value.

Your scatterplot is fine. Also this should have just been in your other thread - there was no real reason to make a new thread as far as I can see so I merged it in with the other one.

#### Dason

I only see ten points on your scatterplot. Are there just that many duplicate values?

#### KimboSlice

I calculated the averages... So 24 reading in one day, took the average and did it the 7 days so it covers all the data.

So is it worth putting that scatterplot in my work, how would I describe the relationship in comparison to the P-Value which was very strong?

If there are any other statistical tests you recommend for these variables, please tell me!

Thanks.

#### Dason

Maybe we should back up. Can you start at the beginning and describe how the data was collected and exactly what you did?

#### KimboSlice

Yeah sure,

The data is from a government statistics website based on air quality and the factors which affect air quality.

You have to look into a particular pollutant and analyse the factors which influence its concentration at a particular site.

The samples are huge, we're talking 24 readings a day every day since 1997 so I chose to look at a small sample ranging over a week so 24 readings a day, 7 days...

I chose to do Carbon Monoxide (CO) as my first pollutant and traffic speed as the factor which influences concentration.

I put the data into minitab alongside one another and used a correlation test and got P-Value = 0.011

Then attempted to re-enforce the relationship with a scatter-plot by taking the averages of 24 readings a day for 7 days and plotted them with the averages of traffic.

However, when the graph shows, there is little evidence of a strong relationship unlike the correlation test?