problem with independent variables


TS Contributor

I need your advice. I have a data file, with a binary dependent variable and some independent ones.
For each person (case), there were 3 measures of characteristics, and in order to identify the person's charateristic (among the 3), the researcher want to choose the highest measure (ex. if in the 1st measure one get 23 and in the 2nd 34, then he's got the second characteristic). I need to use the characteristic of a person as an independent variable. The problem is, what to do if someone has similar measures in 2 or 3 characterisics ? How do I choose ? A researcher (in this hypothetical situation) said to add the 3 measures as independent variables and to evalutate the influence of high and low levels in each measure on the dependent variable.
Is it valid to do so ? Do you have other solutions to this problem ?
(It's a problem because people might have equal measures and then you can't choose a characteristic, or if there were not 3 measures but 7, it would have been a complicated variable).
It sounds on the face of it that wacking the 3 variables in would be a good idea, perhaps using logistic regression. I'm wondering how you were planning to analyse the data if you just categorised the individual based on their highest characteristic (chi-square?). I agree that that it could get complex if you had say 7 measures. I suppose the question is what are you trying to examine? If you wanted to know what measure was able to predict your binary outcome, then using the 3 measures as is would be able to tell you what the importance of each variable has to predicting the outcome. For example, lets say the measures ranged from 1-100. You could have Person A with a score of 20 on measure X and 99 on measure Y. Using the other method, Person B who scores 98 on measure X and 99 on measure Y would be in the same category as Person A but I suspect would be very different people. So using the IV's as is would tell you the relative importance of them to predict the DV which may be confounded if you were to categorise.
Hope this helps :)


TS Contributor
This really helps, thank you !

I was thinking of using all 3 in a logistic model, but I missed a good justification for doing so, you just gave me one, thank you !!


TS Contributor
if I do what you have suggested, do I have to pay any attention to the situation where for example I have p independent variables, when p=t+m+k, where t variables are describing one thing, m variables describe another thing and k for example are background variables (age, gender,...). What I am asking is, do I have to do anything special or to look for anything special if for example 2 or 3 variables are "grouped" under the same subject ( for example personality characteristics).
The only limit I would say on the number of variables you put in would be in relation to sample size. I'm sure there are different opinions on this but as a general rule I would have about 15-30 participants per predictor (although I cannot remember where exactly I got this figure from).

In regards to groups of variables being similar, this should be ok within the limits of potential multicollinearity between those items. That said, you might want to consider what type of regression you will run. Are you just going to put them all in at once? Or it might be worth considering a hierarchical analysis (more theory driven) where you put the variables which you know are predictors and in first (e.g. perhaps background variables are known predictors), then put say the personality ones in next step. This method may be more defensible than just putting a heap of different categories of variables all in at once which might seem like fishing. Others may have some further suggestions?
Does this help?