Help with R an KsTest

#1
Hi,

Some days ago i tried to do a KSTest on R, but i didn't get the answer that i needed. In Fact i didn't understand why this test gives me the same P-value for a True or False Hypotesis.

I tried with an Normal distribution testing a normal distribution, and i tried a Uniform distribution testing a normal distribution. So in this case the First test should had accepted the hipotesys, but not the second.

And the outcome was the same (Not the KS statisctic, but the P-value: 2.3e-9).

Can you give me any advice or explanations?.

Thanks in advance
 

JesperHP

TS Contributor
#2
I just did

Code:
data<-rnorm(100)
ks.test(data,pnorm)
ks.test(data,punif)
my p values were p1=0.7 and p2=2.2e-16 which is as expected....
So not being able to reproduce the error based on info given its hard to come up with an answer.... can you yourself reproduce the odd result and what code exactly did you write?
 

Dason

Ambassador to the humans
#3
Note that ks.test is specific to the parameters given. If you use pnorm then it is testing against a standard normal unless told otherwise.
Code:
> dat <- rnorm(100, 100)
> ks.test(dat, pnorm)

        One-sample Kolmogorov-Smirnov test

data:  dat 
D = 1, p-value < 2.2e-16
alternative hypothesis: two-sided 

> ks.test(dat, punif)

        One-sample Kolmogorov-Smirnov test

data:  dat 
D = 1, p-value < 2.2e-16
alternative hypothesis: two-sided
 
#4
I just did

Code:
data<-rnorm(100)
ks.test(data,pnorm)
ks.test(data,punif)
my p values were p1=0.7 and p2=2.2e-16 which is as expected....
So not being able to reproduce the error based on info given its hard to come up with an answer.... can you yourself reproduce the odd result and what code exactly did you write?

Uhm i see, the diference is that i used dnorm instead of pnorm. Would you mind to explain me which is the difference between dnorm, qnorm, and pnorm?

Thanks in advance
 

JesperHP

TS Contributor
#5
You can read about this using:
Code:
?pnorm
dnorm is used to evaluate the densityfunction for som x. The following gives you a plot of the standard normal density

Code:
x<-seq(-3,3,0.01)
f<-dnorm(x)
plot(x,f,type="l")
qnorm takes p -a probability as argument - and returns a value x such that P(X<x)=p.

pnorm takes an argument x and returns a probability such that P(X<x)=p. Hence pnorm can be understood as the cumulative distribution function F(x) since F(x) = P(X<x)=p. And hence qnorm is the inverse function of F(x).

In all cases R uses the standard normal distribution with mean=0 and standard deviation=1 as default ... use the other argument to change these (see ?pnorm).

If you do not understand this you probably want to read something about continous random variables. It should be written in any book dealing with continous random variables since this is the absolute basics of such variables. You can find some litterature on it simply by googling something like "introduction continuous random variable".

However more important: Understand what Dason writes because he actually reproduces youre "unexpected" result using pnorm noting that the ks.test is parameterspecific. The KS-test dosn't test whether variable is normal simpliciter but rather tests if it is normal with mean=a and sd=b. Using pnorm the mean is as said 0 and sd=1 since R default is the standard normal distribution
 
#7
Thanks for your quick answers, and I know a bit of probability distribution (i'm in 3 year of engineering degree), but R is new for me Before i used to use SPSS, minitab. So thank you guys, i've understood what i was doing wrong. I just put another question in the fourms. i'm gonna explain what i'm trying to do.

I have a set of data and a need to represent them by a probability distribution. So after i estimated the function and parameters, i need to know if the distributions is a good estimation of the data. So i was thinking to use Ks. test, but i've read that The theory of the KS test says that the parameters must be predefined by the user (ie not using the data).

So what test do you recommend me?

Thanks
 

JesperHP

TS Contributor
#8
I rarely make specific distributional hypothesis about data unless this is theoretically validated so this is not a subjectmatter where I have a lot of experience.

However I think that you should provide info on the nature of you're data. Is it timeseries data, cross section, panel.. Off course whether it is univariate and it's safe to assume that observations are realisations of independent variables will matter.

First of all you need to consider apiori reasons to exclude a distribution, fx. if you observe prices you have a theoretical argument against using a normal distribution since prices are always positive but the support of the normal includes negatives.

Secondly I would do a histogram plot this should give you a general idea of the shape of the distribution function.

If this looks approximately normal you can use Jarque-Bera test to see if the data is normally distributed... at least for large samples.

Beyond that I think I will let other people guide you.
 
#9
Thanks for the quick answers,

Actually the data are College courses final grades, and i'm trying to estimate them by using Beta Function. And yes this is only a Theorical work. So, should i use Ks.test?, because i estimated the parameters of the beta test from data.

Thanks