Hmm, I hadn't heard of this approach before.
So you start with a sample (e.g., convenience??, etc.) or the full population? What would you do with this final RSS set, assume it is a better approximation of the population if starting with a sample?
In ranked set sampling (RSS), we select n random sets, each of size n. Then we choose the largest unit from the 1st set, 2nd largest from the 2nd set, and thus n th largest from the n th set for the actual measurement.
What is the intuition that a sample thus obtained will give an unbiased estimate of the population mean?
What is the intuition that a sample thus obtained will yield more efficient estimator than an estimator from random sampling?
I know results of simulation show that RSS gives unbiased and efficient estimation. But without performing simulation, there must be an underlying theme that RSS gives unbiased and efficient estimation for such reasons. What are those reasons?
Hmm, I hadn't heard of this approach before.
So you start with a sample (e.g., convenience??, etc.) or the full population? What would you do with this final RSS set, assume it is a better approximation of the population if starting with a sample?
Stop cowardice, ban guns!
The thing I don't fully understand is the motivation for why one would opt for this approach. I did some simulations and it does appear to be unbiased and more efficient at least for estimating a mean. However, you're basically sampling n^2 values to get a sample of size n. Why not just use the full n^2 values that you needed to sample to create your RSS sample in the first place? I guess if storage space was a concern you could get away with an iterative approach so that you don't have to store the full n^2 observations (only needing memory for 2*n) and in the end only have n observations that do a better job than just randomly sampling n in the first place...
I think thinking about this process and how you actually implement it helps with the intuition for why it does better than just randomly sampling n observations directly. Basically you have n^2 observations to work with - if you can't combine all of that data to give you a 'better' sample then something is going wrong. Obviously just using the n^2 observations directly would yield better results than just using n observations but what we end up doing is sort of summarizing the n^2 values in to n values and using those instead.
Interesting but once again I think the practical applications are fairly limited.
Last edited by Dason; 08-31-2017 at 09:38 AM.
I don't have emotions and sometimes that makes me very sad.
A simple ecological example will illustrate the ranked set sampling method. Suppose the average age of trees on a property needs to be estimated. An appropriate judgment-based measurement (the visual size of a tree - trees generally increase in size as they age) exists. Begin by randomly selecting three trees and judge by eye which tree is the smallest. Mark the smallest tree to be measured and ignore the other two. Next, randomly select another set of three trees to rank. Mark the medium sized tree and ignore the other two. Next, randomly select another three trees. Mark the largest tree and ignore the other two. Repeat this procedure 10 times (10 cycles) for a total of 90 trees. 30 of the trees will have been marked and 60 ignored. Of the 30 marked trees, 10 are from a stratum of generally smaller trees, 10 are from a stratum of generally middle-sized trees and 10 are from a stratum of generally larger trees. Determine the age of each of the 30 marked trees by coring or some other appropriate measurement technique and use that measurement to estimate the average age of the trees on the lot.
In this illustration there were 10 cycles and 3 samples chosen per cycle. In practice, the number of sample locations chosen per cycle (the "set size") and the number of cycles is determined using a systematic planning process. Visual Sample Plan implements the systematic process needed to determine the number of cycles, and hence, the number of locations to be ranked and the number of locations to be measured.
Reference: http://vsp.pnnl.gov/help/Vsample/Des...t_Sampling.htm
One reason is if the sampling units are expensive or difficult to measure then instead of using the full n^2 values, you can measure n RSS sample.
But my question is why this n RSS sample is better than just randomly sampling n observations directly? Is it because it brings the rank information while in random sampling there is chance that the sample is not good representative of the population? Is this "better" affect in efficiency?
Tweet |