standard deviation greater than variance, is this ok?

#1
I am putting together some audit results and it was suggested I present them as mean +/- SD. However there is a significant variance in the results. For example looking at hours it took to achieve a targeted temperature goal : results varied from 0 hours to 11. Using excel I ended up though with a mean of 3.73 and SD 3.6. If I were to present 2 standard deviations I thought that was meant to encompass 95% of all results however that would mean having a negative number as the lower end of the variance. This doesn't make any sense to me whatsoever as how can it possibly take negative hours to reach a goal. Is this just a statistical way of saying though there was probably temp goal achieved hours prior to protocol commencement in some cases?
 

hlsmith

Omega Contributor
#2
This says your data are likely positively skewed and not normally distributed. You could also have outliers that make average not the ideal statistic to report.


Plot a histogram of your data and upload it, so we can see what you are working with. Also include your sample size.
 

noetsi

Fortran must die
#3
A couple of points. First, its impossible for the standard deviation to be greater than the variance because the standard deviation is the square of the variance :p I believe if the standard deviation was greater you would be dealing with imaginary numbers and negative variance and as you suggest that would be most certainly not ok....[or physically possible].

As hslmith notes using confidence intervals the way you intended only works with normally distributed data. An assumption of many methods used in these is normality. So you would note that the CI are not accurate because the normality assumption was not met I believe [although with large sample sizes this might not be true as the assumption of normality is pretty robust to violations with large sample sizes I think].
 
#4
I completely get that the negative stretch isn't right. I just cant figure out how I got there. Is it the inclusion of those that scored zero? Finding it difficult to know how to deal with this. I have attached a histogram as asked that shows x-axis number of violations, y axis number of subjects that achieved that many violations. In this particular set n=17. This gave me a mean of 3.94 hours with a SD of 4.02
 
#5
First, its impossible for the standard deviation to be greater than the variance because the standard deviation is the square of the variance
Noetsi! In my text book the standard deviation is the square ROOT of the variance.

If the standard deviation is 4 then the variance is 16, thus larger.
But if the standard deviation is 0.7 then the variance is 0.49, thus smaller.
And if the standard deviation is 0.5 then the variance is 0.25, thus smaller.

Noetsi! You are advising complete beginners. Look at what you are writing!


As hslmith notes using confidence intervals the way you intended only works with normally distributed data.
Neither the original poster (OP) or Hlmith has talked about a confidence interval. A confidence interval is about an estimated parameter. The OP wanted to describe the data. The OP can use the quartiles or the 2.5% percentile and the 97.5% percentile.

It would not be surprising if the Time, were the mean and standard deviation is about the same, is exponentially distributed.
 
#6
As I go along with post I am realising my error with the initial title, very new with this stats stuff.SO I get my variance is essentially 12 (0 being one end of the values, 12 the other) which actually gives me a SD of 3.46 (not the 4.02 excel was giving me). I realise that 4.02 that I got is not greater than the variance, what I was so ineloquently trying to get across is when I graph mean and SD the negative arm of the SD was extending below zero. A my data indeed seems to be non-normally distributed would I then be better presenting it as a median and IQR? I'm trying to find some way to best represent the central tendency while still giving an accurate representation of the large variation in results
 
#8
SO I get my variance is essentially 12 (0 being one end of the values, 12 the other) which actually gives me a SD of 3.46 (not the 4.02 excel was giving me).
I believe you mean the range is 12, rather than the variance. The range is the difference between the highest and lowest values in a data set. To find the variance each observation is subtracted from the mean and squared. An average of these squares is the variance and you square root this to get the SD. In this sense the variance is the average amount the data varies, while the range shows how far the data is spread out.