+ Reply to Thread
Results 1 to 3 of 3

Thread: Many predictors relative to cases, want to identify interesting ones

  1. #1
    Points: 6,387, Level: 52
    Level completed: 19%, Points required for next Level: 163
    Junes's Avatar
    Location
    Netherlands
    Posts
    246
    Thanks
    17
    Thanked 25 Times in 20 Posts

    Many predictors relative to cases, want to identify interesting ones




    I've been helping a colleague with his research project. He is looking at the trustworthiness of profile texts. He is analyzing text entries using the LIWC tool, which generates around 90 dimensions based on text input (positive/negative affect, pronoun use, etc.). Most of it on the ratio or interval level. We might not use all but we will probably want to use 30-70 or so.

    In this exploratory research, he wants to identify important predictors of perceived trustworthiness in profile text.

    Due to sampling constraints we are likely to get no more than a few hundred (150-400) cases. These are human ratings of trustworthiness. Thus, we have a relatively large number of predictors relative to cases (somewhere between 1:3 and 1:10), which might be a problem. It's exploratory research so I don't think statistical tests make a lot of sense, but I do want to avoid too spurious results.

    Now, my question is this: what kind of approach would be most useful for this problem? Preferably something not overly complex, as neither of us are statisticians. Ideally it should be doable in Stata because we're working with that.

    My current thinking is something like this:

    1. Investigate and report bivariate correlations for all predictors with the outcome
    2. Then building a linear regression model with some kind of variable selection, for instance forward selection, backward elimination, or LASSO (the last one might be a bit too complex).

    Does that seem at all workable? I'm a bit worried about forward selection/backward elimination, since from what I've read it doesn't produce very stable results.

    Any very different ideas would be welcome too. I greatly appreciate any input!
    Last edited by Junes; 10-13-2016 at 07:34 PM.

  2. #2
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Many predictors relative to cases, want to identify interesting ones

    hi,
    I would try regression trees and/or principal components. The regression tree is pretty good in handling many predictors with a relatively small number of measurements, PA would reduce the number of dimensions if you are lucky.

    regards

  3. #3
    Points: 6,387, Level: 52
    Level completed: 19%, Points required for next Level: 163
    Junes's Avatar
    Location
    Netherlands
    Posts
    246
    Thanks
    17
    Thanked 25 Times in 20 Posts

    Re: Many predictors relative to cases, want to identify interesting ones


    Thanks for your reply! Those are interesting suggestions. Not sure if PCA is a good option because from what I've seen the results tend to be quite hard to interpret, whereas we want something that is meaningful to us in the real world.

    I thought about regression trees or Random Forest, since I've worked with them and they seem to deliver good results in cases like this. But I think for him this would be a bit too exotic. Probably best to stick to some kind of linear regression. But the more I read, the more things like forward selection seems like a misguided idea. Maybe LASSO would work (if I can convince him ). But preferably I would like something better than FS but similar in simplicity.

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats