Histograms Reveal Potential Problem

#1
Hi

I estimate forest wood volumes by measuring tree diameters and heights in circular forest plots. From these plots I record the volumes then find the variance of the plot volumes, variance of the mean, and standard error which leads me to a 95% confidence interval for the mean plot volume and total forest volume estimates.

I recently generated histograms of plot volumes for the most recent forest "cruises" and found that there were more reverse-J shaped curves than bell curves. (The histograms display ranges of plot volumes (x) and their frequencies (y)). So the cruises with reverse-J curves show that there is a higher frequence of small plot volumes than large volumes.

My question: Is it OK to generate variance, standard error, and conf interval for non-parametric data? My textbook on forest sampling techniques, while mentioning the actuality of parametric vs. non-parametric data, says nothing about whether variance to confidence interval is appropriate for non-normal data.

Invite everybody or anybody to chime in on this.
 

Miner

TS Contributor
#2
Can you use nonparametric counterparts such as median, confidence interval for the median, quartiles, etc.?
 
#3
Can you use nonparametric counterparts such as median, confidence interval for the median, quartiles, etc.?
I would rather stick with my customary reporting. It's what my audience expects, and the method is supported by the textbook. I just want to be sure it's kosher.
 

hlsmith

Not a robit
#4
Just visualize the issue. The sample is a realization of the true target population. If you have a representative sample than you can make reasonable parameters. However, naturally skewed data are they symmetrical, NO. Your estimate doesn't know this or what you are working with, it thinks that the variance is symmetrical around the mean. An example may be something like heights. If the variable is positively skewed, the lower value of the confidence interval may not even be feasible, e.g. say you have some thing like: mean 13 (95% CI: -5, 28; totally made up numbers). But it gives you bogus values. Do you see the issue?

Can you post your histograms, so we can better understand your J'ish distribution?
 
#5
As you requested, here are recent histograms:
harmsflat.JPG
appears bell-shaped

harmshist.JPG
appears reverse-J

bobryt.JPG
appears reverse-J

Pls, hlsmith, I'm interested in what you are saying, but it's a little over my head. Pls make it simpler:confused:
 
#6
HLSmith: In all forest cruises I have always gotten reasonable-looking confidence intervals i.e. no negative numbers. BTW VAC is volume per acre.
 

hlsmith

Not a robit
#7
I may have been making it a too large of an issue for shock value, but say you are using standard normal distribution. You would typically say 95 of data are within +/- 2 standard deviations. If you apply this rule to skewed data that may be overdispersed (at least on one tail). Well then in the other tail side your estimate will be inflated or larger than the tail.

This becomes less of an issue with larger sample sizes and an easy fix is just using the 95 percentile bootstrap interval for the mean.

P.S., Natural logarithmic transformations of positively skewed data may normalize them.
 
#8
I think you are saying that the smaller tail will be overrepresented, right? Is 30 samples a sufficiently large sample size to not be concerned about skewed data? Please explain the 95 percentile bootstrap interval. How does that work?
 
#10
if I have a
mean: 5
standard deviation: 16
sample size: 20

95% CI: mean +/- 1.96(16/sqrt(20) = (-2.01, 12), so the heft of the Std comes from the right tail but is getting equally applied to both the left and right when calculating interval.


Bootstrap:
https://ocw.mit.edu/courses/mathema...pring-2014/readings/MIT18_05S14_Reading24.pdf

People like to say if the sample size 30+ you may be OK, but it depends on the skewing.
Your mean of 5 and sd of 16 seems really extreme, but I get the point.

Alright I've got a lot of stuff to google here. My next cruise report may have a histogram included. Maybe I'll get a kolmogorov-smirnov test going to determine normality, logarithmic transforms. . .

You will probably be hearing from me again. h2-2.jpg

A
 

ondansetron

TS Contributor
#11
...
My question: Is it OK to generate variance, standard error, and conf interval for non-parametric data? My textbook on forest sampling techniques, while mentioning the actuality of parametric vs. non-parametric data, says nothing about whether variance to confidence interval is appropriate for non-normal data.

Invite everybody or anybody to chime in on this.
I think your question is a good one, however I'm commenting only on a small part to deal with conceptualization and terminology. This is something I see frequently in my area of work/study that people misinterpret the concepts of parametric vs non parametric as data types when these terms really refer to broad classes of analytical techniques. Data come from underlying distributions and have a certain measurement scale such as nominal, ordinal, interval, or ratio. The idea of "type of data" and "parametric vs non parametric" should be more carefully discussed as data are neither parametric nor nonparametric.

I would also suggest just using graphical methods to assess for deviations from normality (in addition to the use of your subject matter knowledge). These two tools can do a better job at helping you determine if the data arise from a roughly normally distributed population than would a formal statistical test that is often highly sensitive to slight departures from normality and may subject you to false security of "normality" in the context of low power and a nonsignificant result.
 

hlsmith

Not a robit
#12
To piggyback on Odan,

Overlay your histogram with its own density plot and that of the normal distribution. In addition Q-Q plots can be beneficial.