Bagging predictions with binary response variable in R

#1
I am trying to use the bagging technique to increase my model's predictive power. My response variable, status, is a binary variable where 0 indicates no disease and 1 indicates disease. The variable `status` is just a vector of repeating 0's and 1's (so its class is 'numeric' not 'factor'). Not sure if this is relevant but I just wanted to point that out.

Code:
mod=bagging(status~x1+x2+x3+x4, method="class")
        pred=predict(bagging, newdata=fit26data[1:282,], type="Class")
And `pred` consists of a list of values ranging from 0 to 1
Code:
       > pred
      [1] 0.0465 0.3930 0.4426 0.4905...and so on
I'm confused about why `pred` didn't just return a value of either 0, or 1? Does this have to do with the fact that my response variable, status, is just a vector and not a factor variable? If the predictions are supposed to range from 0 to 1 in this case, what's the cut off point for whether the prediction should be classified as 0, or 1? Would it simply be 0.5?
 

Lazar

Phineas Packard
#2
type="Class" should be type="class" (no caps). In any case looking up ?predict.regbagg leads me to believe you want aggregation="majority" not type="class".

p.s. Why are you using bagging rather than boosting?
 
#3
Thanks for the response. I changed my response variable to a factor variable and the predict function did return either 0's or 1's. I noticed that using bagging actually lowered the predictive power (the tree I built using rpart actually predicted more accurately) but I'm not sure why. A professor suggested that I use the bagging technique. I'm not too familiar with boosting, would that be better than bagging in this case?
 

Lazar

Phineas Packard
#4
Usually the multiple runs are less correlated in boosting than they are in bagging. To try both do the following:

Code:
library(randomForest)
#Bagging
mod=randomForest(status~x1+x2+x3+x4, mtry = 4)#for bagging mtry must equal the number of features you are using
#Boosting
mod=randomForest(status~x1+x2+x3+x4, mtry = 2)#mtry equals some value < the number of feature. I guessed sqrt(n features) as that is generally pretty close to optimal value
 

Lazar

Phineas Packard
#5
p.s. the package e1071 has functions like tune.randomForest which you can feed a range of guesses for things like mtry and have cross validation pick the best values for you. I have found the square root o the number of feature is pretty good but I can do better by tuning the value with cross validation.