Apply a transformation to distrubition?

Hello all,

I'm doing upwind/downwind comparisons for air quality data. I recieved some feedback yesterday (thanks Simon!) that a paired t-test would probably be an appropriate way to compare hourly averages from two sites located about 1km apart. Also, that the data set is probably large enough (~500 hourly readings) to appproximate parametric (because of the cental limit theorom).

My questions now:

My data seems to have an asymetrical distrubution (e.g. a long tail toward high concentrations). Would the paired-t test be more robust if I did a transformation?

Also, I plan on presenting some other descriptive statitics such as mean, standard deviation, maybe percentiles. Would a transformation be appropriate for these descriptors?

Many thanks!! I'm really glad I stumbled upon this forum as a resource!



Ambassador to the humans
Also, that the data set is probably large enough (~500 hourly readings) to appproximate parametric (because of the cental limit theorom).
I didn't really read all of the post but what I quoted isn't necessarily directly true. The Central Limit Theorem I'm guessing you're thinking of requires independent observations. Are you observations really independent? (Probably not).

CLT can work for dependent data but you need a few conditions to be met.

If I'm trying to determine statistical significance of difference between pairwise measurements at the sites, would it be safer to use something like the Mann-Whitney test?

OK - I think I have answered my own question by looking online and browsing through posts. Please correct me if I am wrong:

Since I am comparing upwind and downwind measurements at hourly intervals over the same time period, I can used a paired comparison test. I also have 1-minute data, but it is probably not appropriate to use these data because I have to accound for transport time between sites.

The paired t-test requires that the differences between values (not the actual values at either end) follow a normal distribution.

Before applying the test, I should check for normality using the Kolmogorov-Smirnoff test. If the distribution is normal, I can use the paired t test. If it is not normal, I should use the Mann-Whitney test.

Thanks for the guidance!


New Member
Hi Cassie

That's right the paired t-test requires that the differences between the measurements follow a normal distribution.

But, even if they don't, as far as I understand it, if the number of independent measurements is large enough, the sample average will have a normal distribution and, since the t-test computes the average, it doesn't matter too much.

In any case, an easy thing to do is simply compute the t-test and a Mann-Whitney U test and see what happens. You'll probably find that there's little difference between the two in terms of p-value if your sample size is large enough.

As for how to work out if there is correlation (I'm not sure that auto-correlation is the correct term) between variables, I think that you'd need to perform linear regression with the difference (or percentage difference) between the two sites as the dependent variable and the other variables as your predictor variables, but now you are really getting to the limit of my very limited statistical 'knowledge'. I think I should point out that I have never studied statistics beyond GCSE having instead opted for 'mechanics' at A-level which has proved totally useless since leaving school! On the other hand I think that any advice that Dason gives is significantly more robust!

With best wishes


That is a good idea to try both tests to see if there is much difference between the results.

I have SigmaPlot12 on order, and I should get it Monday. Hopefully it will be easy to try a few different things. I've done a literature search for upwind/downwind air quality comparisons, but it doesn't look like there are consistent methods.

I appreciate the help!