variable selection

  1. S

    What are the differences between group bridge, composite MCP, and group exponential lasso methods?

    In the context of variable selection, do these methods minimize the same partial likelihood function? And do these methods use the same type of cross-validation for the choice of the shrinkage parameter? Thank you for taking the time to read my questions and provide your helpful answers.
  2. M

    Forward selection of non significant variables

    Hi, In SAS Enterprise Miner, I trained a logistic regression with forward selection and AIC criteria. I grouped rare levels for categorical variables. One of these variables was selected by the algorithm but the coefficients of all categories were statistically not significant (different...
  3. M

    Variable importance in random forest

    Hi, In order to predict a binary target variable, I trained a random forest with 84 explanatory variables (using 10 variables randomly selected in each split) on a training set composed of 8,500 observations. For practical reasons, I had to test the performance of the algorithm on a test...
  4. C

    Adaptive Lasso and CrossValidation for SNV selection

    Hello everyone, I have available 17 000 variables (SNV frequencies, a certain number of zeros) for 40 patients. Each patient is represented by its response to a treatment : 13 responses, 27 no-responses. I want to extract a subset of SNV which can have strong prediction power. Because of...
  5. D

    Correlated variables in Cox model - which one is best

    Hi there, I am building a Cox model. I have two variables that are different measures of the same thing and they are therefore correlated with each other. Both variables are strongly associated with survival. My understanding is that it is not ideal to place correlated variables in the same...
  6. R

    Variable selection in Logistic Regression

    Hello all, I have a query regarding attrition analysis using logistic regression. Say that I calculate the length of stay of an employee in an organization (say 'Tenure') and divide tenure in to 10 buckets . Every employee falls in to any one of these buckets(1 to 10). Say percentage attrition...
  7. W

    mixed predicator types for building logistic regression model

    There has a data set, some predicator variables are of categorical types, such as X1={A, B,C}; X2={1, 2, 3}; and other predicator variables are of continuous types, such as X3 can be a real value between 0 and 100. If we want to build a logistic regression model based on these predicator...
  8. S

    Help with variable selection in logistic regression using a small dataset

    Hello. I have a relatively small data-set looking at a binary outcome of death after a medical procedure. There are 102 patients total in the data-set, and 26 deaths. I am interested in looking at correlates of death. I first calculated univariate odds ratios, and have a list of 11...
  9. P

    number of variables for a model

    I am now trying to build a cox model: h(t, beta) = h0(t)exp(X*beta) I have selected some variables from the data, and for a category variable, if it contains N categories, I will set up N-1 dummy variables for it, if so, there are 26 variables in my model, is it too many for a model? But...
  10. A

    stepclass - klaR package. variable selection

    Hi all, i am using the stepclass function which is part of klaR package. I have everything running just fine using a stopping criterion based on improvement of the performance measure. However, now i want to run stepclass and stop when the "best" r variables have been found even if the...
  11. R

    If a potential third variable is a function of two others, is necessarily redundant?

    Say I am using regression analysis to try and estimate the assembly time for widgets (which can be produced in various sizes) given some possible independent variables: Width Height Quantity of pieces (different sizes used as components) Did the American League win the World Series etc...
  12. M

    Using LASSO with economic fundamentals as potential predictors

    I am looking to use LASSO for variable selection, in the context of an economic factor model. My response variable is a balanced panel dataset, with n securities, each having t observations. My independent variables - potential predictors - are k economic variables: each of the k variables is...
  13. B

    Binary Logistic SNAFU

    Hello, I'm curious if I have coded my variables incorrectly for an analysis. Here's a brief explanation: Independent variables: X1: temperature (ordinal: 10 or 4 degrees C) X2: treatment group (categorical: exposed to pathogen or not) X3: sex (categorical...) X4: index of body...
  14. O

    Variables that are Both Predictors and Response Variables

    I have been reading about variable/feature selection algorithms. All of the algorithms that I have seen presume that there is a clear distinction between predictive and response variables. However, I have some variables that could be either. I would like to analyze a large set of variables...
  15. A

    Standardizing Similar Groups with Different Internal Variables

    I am on the student government at a college, i feel like there has to be a way to create a scoring system for a organization. For example like lets say org through great events, used all its money, followed all of our guidelines. They get their budget increased by 1.125 the upcoming year. There...
  16. B

    selecting variables for a cluster analysis

    The situation is as follows: 1) My participants (aprox N = 80) produced written answers for a complex problem. 2) I content-coded these answers through a 11-code scheme. Some of these codes are binary (i.e. content present = 1, content absent = 0), while other codes are categorical with 3 to 5...
  17. N

    Reduce number of predictor variables ((multi)collinearity): PCA not helpful

    Howdy, The attached spreadsheet contains the correlation matrix of predictor variables and response variable. Correlation exists between them. Additionaly included are the results from a Principal Components Analysis on my (scaled) predictor variables. I used the prcomp command in R with the...
  18. T

    creating families w. both siblings and stepsiblings

    Hi. I have been having problem figuring out how to do this in stata for a while now. I am working with a dataset on disabled/not disabled children including all the children’s siblings/stepsiblings and their parents. The goal is to group the id’ed children into groups w. siblings and...
  19. M

    Variable selection for multiple regression analysis

    Hi everybody there. I want to correlate 2 main variables (say "a" Dependant and "b" independent of course) BUT at the same time i want to add 5 other independent variables ("c,d,e,f,g") also in the correlation. So principally, its a multiple regression. One way of doing that is step wise...