Using statistical methods to find threshold/base line?

#1
Hello everyone,

I am a newbie to statistics and would like to seek some advice. Are there any statistical methods that can be used to determine the threshold or baseline for a dataset? Threshold/baseline here would act like a flag. What goes over or below it would probably mean that the datapoint is of concern.

Thanks in advance!
 

Karabiner

TS Contributor
#2
Maybe some more details would be useful to know.
What do you mean by dataset here? What was measured, on which scale(s)?
Do you refer to just 1 variable or to several variables? Why do you need
to flag something, what does it mean whether a data point is of concern?
And how large is your dataset?

With kind regards

K.
 
#3
Maybe some more details would be useful to know.
What do you mean by dataset here? What was measured, on which scale(s)?
Do you refer to just 1 variable or to several variables? Why do you need
to flag something, what does it mean whether a data point is of concern?
And how large is your dataset?

With kind regards

K.
Thanks karabiner for your reply!

There are no scales. Dataset comprises of monthly volume data (2 columns- date, volume) over a few years.

Was thinking if it is possible to apply stats method to this dataset to derive the threshold or baseline. So can check a particular month volume against this threshold/base line. To determine if that month data is considered high or low
 
#6
Hi nizze,
sorry if I answer to your old post, but I have a curiosity regarding your question.
When you say baseline, you need that quantity for doing forecast (for example of commercial demand in a market)?
Anyway to calculate a baseline like "average value" you can assume that your data follows (or before you can do a test) a normal distribution. Then you can choose an alpha factor (normally a "realistic" choice could be 0,05) which you'll use for determining a confidence interval. For example for alpha=0,05, the interval is
[mean-1,96*std.dev.;mean+1,96*std.dev.].

https://en.m.wikipedia.org/wiki/File:NormalDist1.96.png

With normal hypothesis confirmed you'll have the 95% of probability to finding a new value included in that interval.

For your scope, you can use the interval for determining the "outlier" values (those out of interval).

Hope I was helpful to you (if you still need).