Recent content by randomcat

  1. R

    What are some data mining techniques for analyzing cause of disease

    I have a dataset of 300 observations, of which 200 are normal, and the rest have the disease. I have the cognitive assessment scores of these 300 participants, and the assessment is divided into different sections: delusions, depression, anxiety, etc. I'm wondering what technique(s) would be...
  2. R

    Bagging predictions with binary response variable in R

    I am trying to use the bagging technique to increase my model's predictive power. My response variable, status, is a binary variable where 0 indicates no disease and 1 indicates disease. The variable `status` is just a vector of repeating 0's and 1's (so its class is 'numeric' not 'factor'). Not...
  3. R

    How to view the resulting tree using the bagging function in R?

    I constructed a tree with the `rpart` function. Then I can plot it to look at the tree visually and also look at what % of the observations were classified correctly using `table(predict(...), ...)`. mytree=rpart(y~x1+x2+x3+x4, method="class") plot(mytree) text(mytree)...
  4. R

    rpart function: how to know the % of correct classification at every terminal node?

    I have a dataset with 277 observations.I have binary response variables i.e, 0 indicates no disease, and 1 indicates disease. I know that 180 of the observations have no disease and the 97 have the disease. I build a model and construct a classification tree to see how well my model correctly...
  5. R

    How to use boxcox function in R

    I run the following code in R: boxcox(data, lambda = seq(-2,2), interp=TRUE, plotit=TRUE) Where data is a vector of integers, but I get the error Error: $ operator is invalid for atomic vectors How can I fix this? Furthermore, how can I specify how much I want the lambda to increment by?
  6. R

    How to calculate p-value of two-sample t-test

    I have 2 independent data sets, and I know the following about each of them: mean, SD, and sample size. I calculated the t-statistic just fine my.t.test<-function(mu1, mu2, sd1, sd2, n1,n2){ t=(mu1-mu2)/sqrt((sd1)^2/n1+(sd2)^2/n2) return(t) } I know that the degrees of freedom...
  7. R

    Which programming language/database to learn?

    I'm interested in pursuing a career in biostatistics, and I'm wondering which language/database will be useful in the field of biostatistics/epidemiology. As of now I only have a basic knowledge of C an R. Thank you.
  8. R

    What courses to take as an undergrad if I want to pursue a Master's/PhD?

    Hi, I'm currently a freshman at a large research university in California. My school offers 3 B.S. degree options: general statistics, applied statistics, and computational statistics. At first, I chose applied statistics because I hope to pursue a career in epidemiology or biostatistics in the...