Physical Sub-Sampling in the Oil Industry

Greetings All. Newbie here with absolutely no experience with Applied Statistics and a real-world question.
I have a 6 inch oil/water pipeline that is flowing 80,000 gallons/day of 95% water. The remaining 5% is oil. There are two techniques used to sub-sample this flowstream to determine the overall ratio.
The first technique involves a small tube inserted into the passing fluid and a 1.5ml sample is drawn every 18 seconds. During a typical 6 hour sample run, 1.8 liters of sample are withdrawn which are Karl Fischer titrated to determine the exact oil/water ratio. During that 6 hour period, approximately 795,000 liters of fluid have passed the sample port.

The second technique involves a 2 inch X 12 inch tube suspended longitudinally inside the pipe which is measured electronically. Every 1 second, data from the tube is averaged and stored in memory. It is calculated that this tube is measuring 11% of the passing volume.

My question is, what is the chances of either technique measuring the true ratio of the passing fluid ?

Thanks to everyone who considers my question.

Mark Hill - President,


Less is more. Stay pure. Stay poor.
Your question lands in the realm of sampling distributions. This is something I am slow to understand at times.

Are these ratios thought to be constant throughout the day, season, and year? It may be beneficial to graph ratios over time if their may be variability.

To slightly get at your question, this seems to be a time when you may wish to calculate confidence intervals on your data. In statistics, we usually calculate 95% confidence intervals. This translates as, during repeat sampling the true value lands in the interval 95% of the time. The 95% is arbitrary and an acquire vestige from agriculture research. So if you wanted you could calculate the 99% or 99.9% confidence intervals. You may be able to calculate it for the proportion (ratio).
Thanks for your response. Hopefully you can shed some light on my lack of understanding.

The ratios I quote are the reason why sampling is done in the first place. As you can imagine, crude oil is an expensive commodity. In the oil industry the ratio of water/oil is referred to as "watercut". Even at 95% water, the remaining 5% crude makes the overall product and extraction process quite profitable.
The problem is, today it may be 95% and tomorrow it may be 96%. That simple 1% difference can mean millions of dollars in profits. So, rather than wait until the oil and water are separated into the constituents, an attempt is made to "sub-sample" the product in real time. I hope that helped.

You are absolutely correct. I am attempting to apply a "confidence level" to the two sampling techniques, to better understand their potential "accuracy".

Any ideas ?

Thanks again.
Mark Hill
Am I asking the wrong questions ?

It seems to me that these two distinct sampling techniques should be comparable in anticipated accuracy. Maybe I'm not expressing myself correctly.

What form of information is required to effectively compare these two sampling techniques ?

Thanks to all.



Can't make spagetti
Am I asking the wrong questions ?
you are not, it is a valid (and interesting) question... but i do have to say that (although maybe you don't realise it yet) what you're asking is actually more complicated (lemme rephrase that, A LOT more complicated) than what you think it is. hence the lack of responses :)
Thanks Spunky;

Unfortunately, I do know this process is extremely complicated. That's why I've been scratching my head for months. In fact, I can toss in variables that you probably haven't considered. For example, as we all know, oil floats on water, therefore the oil/water mixture travelling through pipelines is not homogeneous. Just prior to measurement, It is forced through a device called a "static mixer" which mixes the solution as effectively as possible. My two sampling techniques in question are mounted directly after the mixer.
I'm assuming the efficiency of the static mixer must be determined prior to assigning values to the sampling techniques. An estimate of oil droplet size might also be helpful.

I'm willing to do research if anyone here can point me in the right direction.

Thanks all;
Mark Hill


Can't make spagetti
have you considered asking someone who specialises in sampling theory? now that you mentioned there are even more variables you haven't mentioned i i think you do need more professional help than what you could get in a forum made up essentially of volunteers. don't get me wrong, people here are very capable, but it just seems like your problem needs a little bit more of a specialist dedicated to it.


Less is more. Stay pure. Stay poor.
Just saw your update, "I am attempting to apply a "confidence level" to the two sampling techniques, to better understand their potential "accuracy".

The confidence interval will give you an interval that the true level should fall into, not the true value itself. If you have a gold standard or the actual true breakdown that you later receive, you can approach this question in a slightly different way. You can compare the true ratio to the samples, not sure if that would be feasible given your presented scenario.
Thanks hlsmith;

The problem I foresee is that my process contains more variables that I can wrap my mind around.
I'm attempting to define the accuracy of two completely different sampling techniques, that are sampling an unstable emulsion over extended periods of time.
Sometimes the flow is 100% water and sometimes 100% oil, and after 6 hours of sampling, both techniques agree that the final solution is somewhere near 96% water, and 1% in either direction is a big financial difference. Accuracy is essential, but I can't decide which technique is producing more accurate data.
Obviously we can't let 800,000 liters of fluid settle for days to determine the exact values. Even then we won't be absolutely accurate, because, a bunch of this water is actually dissolved in the oil and will not separate under gravitational forces.

I'm stuck in a cloud of unknown variables !!

Thanks for your consideration


Can't make spagetti
The problem I foresee is that my process contains more variables that I can wrap my mind around.
many times a lot of these variables are correlated. in fact they can be so correlated they are actually redundant. maybe you can use some sort of dimension-reduction method like principal components analysis or factor analysis? i guess even if you can't point towards *THE* variable with the highest impact, as long as you can say something like "from all the variables it seems like some combination of these 3 or 5 variables are what create he highest impact on accuracy" is enough, right?

actually what concerns me the most is not so much the number of variables but you seem to have a whole bunch of dependencies elicited from the sampling schemes and i really have no idea how to work around with those.
I haven't been ignoring your last spunky, I've been scratching my head again. Sooner or later I won't have any hair left !!
Please consider this ..... If the water/oil solution were perfectly homogeneous, the sample size or frequency would have no effect on accuracy. A tiny sample taken once would produce the same results as a large sample collected over an extended period of time. So, based upon a bunch of variables including mixing, the oil breaks into small droplets for a short period of time. That is exactly where I intend to do my sampling. I'm now wondering if the droplet size is the most important variable.

Maybe you're right. Maybe I shouldn't read more on Sampling Theory.

Thanks again


Less is more. Stay pure. Stay poor.
Are the test you are using specifically design to just analyze samples similar to yours? If so, the manufactures should know how reliable they are through product development and quality testing. An option may be to contact them or review their literature to see if they report how accurate the measurements are. This may provide you with some additional information to contemplate when compare values between the tests.
Thanks All;

I have assumed that oil droplet size is the governing factor behind the accuracy of my two questionable sample ports. I must also assume the water/oil are thoroughly mixed because I have no method of proving otherwise. Therefore, if I mix 5% oil into 95% water, and after sufficient agitation, the oil forms small 1 mm droplets, a large sampling port has a greater chance of accurately sub-sampling a large flowing volume, than a small sampling port simply because the large port has a greater chance of measuring more of these tiny droplets.

Although I'm a green newbie, I'm certain there must be a method of determining the chances that a large sample port is more accurate than a small one.

Thanks to everyone for their help.
Mark Hill


TS Contributor
looking at this problem from a purely practical POV it seems to be an exercise in control charting. You can build a run-chart for both measurement methods and look for specific patterns that tell you that the quantity you measure changed more then you could reasonably expect by assuming random variation. In your case this could be for example an increase in oil content or a decrease - with a control chart you will get a warning signal. The specific tool is called a control chart.

The quantity to look at to qualify a sampling scheme/chart is called ARL (average run length) and gives us how many points we have on the average between two signals . One can calculate the ARL for a sampling strategy just by using the data you gave - might be worthwhile looking at SPC theory . Comparing the methods would reduce to comparing the ARLs - large common cause ARLS and small special cause ARLs are good.

kind regards
thanks rogogel;

Another complicated avenue of thought which may simplify my search for answers. I'll poke about the internet looking for control chart info and discussions on SPC.
If nothing else, I'm gaining a better understanding of the field of statistics !!