Bagging predictions with binary response variable in R

I am trying to use the bagging technique to increase my model's predictive power. My response variable, status, is a binary variable where 0 indicates no disease and 1 indicates disease. The variable `status` is just a vector of repeating 0's and 1's (so its class is 'numeric' not 'factor'). Not sure if this is relevant but I just wanted to point that out.

mod=bagging(status~x1+x2+x3+x4, method="class")
        pred=predict(bagging, newdata=fit26data[1:282,], type="Class")
And `pred` consists of a list of values ranging from 0 to 1
       > pred
      [1] 0.0465 0.3930 0.4426 0.4905...and so on
I'm confused about why `pred` didn't just return a value of either 0, or 1? Does this have to do with the fact that my response variable, status, is just a vector and not a factor variable? If the predictions are supposed to range from 0 to 1 in this case, what's the cut off point for whether the prediction should be classified as 0, or 1? Would it simply be 0.5?


Phineas Packard
type="Class" should be type="class" (no caps). In any case looking up ?predict.regbagg leads me to believe you want aggregation="majority" not type="class".

p.s. Why are you using bagging rather than boosting?
Thanks for the response. I changed my response variable to a factor variable and the predict function did return either 0's or 1's. I noticed that using bagging actually lowered the predictive power (the tree I built using rpart actually predicted more accurately) but I'm not sure why. A professor suggested that I use the bagging technique. I'm not too familiar with boosting, would that be better than bagging in this case?


Phineas Packard
Usually the multiple runs are less correlated in boosting than they are in bagging. To try both do the following:

mod=randomForest(status~x1+x2+x3+x4, mtry = 4)#for bagging mtry must equal the number of features you are using
mod=randomForest(status~x1+x2+x3+x4, mtry = 2)#mtry equals some value < the number of feature. I guessed sqrt(n features) as that is generally pretty close to optimal value


Phineas Packard
p.s. the package e1071 has functions like tune.randomForest which you can feed a range of guesses for things like mtry and have cross validation pick the best values for you. I have found the square root o the number of feature is pretty good but I can do better by tuning the value with cross validation.