PDA

View Full Version : a suitable roughly scaling



katerina
04-14-2009, 05:56 AM
Hi there
the folowing code is for training dataset (trPima) and testing dataset (tePima) I want to use k-nearest neighbours but first I need to scale the data so that the range of the 7 explanatory is roughly 1.I have tried to calculate the ranges
but i don't know how to chose a suitable rough rescaling.

my code is:

> mins <- c(0, 56, 38, 7, 18.2, 0.085, 21)
> maxs <- c(14, 199, 110, 99, 47.9, 2.228, 63)
> rangs <- maxs - mins
> rangs
[1] 14.000 143.000 72.000 92.000 29.700 2.143 42.000

and I want to fill in the following (?).

scaletrPima<-data.frame(npreg=npreg/?,glu=glu/?,bp=bp/?,skin=skin/?,bmi=bmi/?,ped=ped/?,age=age/?)

> summary(tePima)
> mins <- c(0, 65, 24, 7, 19.4, 0.085, 21)
> maxs <- c(17, 197, 110, 63, 67.1, 2.42, 81)
> rangs <- maxs - mins
> rangs
[1] 17.000 132.000 86.000 56.000 47.700 2.335 60.000

scaletestx<-data.frame(npreg=npreg/?,glu=glu/?,bp=bp/?,skin=skin/?,bmi=bmi/?,ped=ped/?,age=age/?)
and again to fill in the (?)

Note that both the training and the test datasets need to be scaled using the same scale factors.
Any suggests???

Thanks in advance
Katerina
xx
Edit/Delete Message

Mike White
04-20-2009, 07:03 AM
I think the following should do what you want

scaletrPima<-as.data.frame(scale(trPima, center=mins, scale=rangs))

the mins and rangs variables from the training set can then be used to scale the test data, i.e.

scaletestx<-as.data.frame(scale(tePima, center=mins, scale=rangs))