+ Reply to Thread
Results 1 to 4 of 4

Thread: Transformation of panel count variables with multitude of zero values

  1. #1
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Transformation of panel count variables with multitude of zero values




    Hello dear forum members!

    For a panel of 1,700 physicians observed from 2009 to 2013, my data includes two variables -- number of favorable(fav) and unfavorable(unf) "keywords" counts, extracted from the online reviews (fav min-0, max 30; unf min-0, max 8). My intention is to use "fav/unf" ratio as an overall "sentiment" (derived from the reviews).

    What concerns me is that both fav and unf encounter 40-50% of 0 values (mainly because many reviews are too short or missing at all for a given year), thus resulting in a multitude of missing values, as division by 0 is not defined.

    To avoid this, so far I tried the following transformations of favorable and unfavorable: (1) adding +1; (2) recoding 0 into 1; (3) recoding 0 into 0.0001; (4) recoding all values into z-scores. And then divide one by another to create a ratio for the "sentiment".

    Each transformation results in “some loss and gain”, with option (1) (i.e., adding +1) seemingly being “the most “ratio-like”. Yet, it also results in loss of information, since 10/10 becomes the same as no counts at all.

    Have you ever come across such “transformation” issue? Or may be see a better solution? I would sincerely appreciate any comments and advises of yours.
    Last edited by kiton; 07-19-2015 at 03:40 PM.

  2. #2
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Transformation of panel count variables with multitude of zero values

    hi,
    maybe you could try to model your data using a zero inflated model and see if the model changes overt time?

    https://en.m.wikipedia.org/wiki/Zero-inflated_model

    regards

  3. The Following User Says Thank You to rogojel For This Useful Post:

    kiton (07-20-2015)

  4. #3
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: Transformation of panel count variables with multitude of zero values

    Thank you for the suggestion , rogojel. Yet, the two variables of interest are considered as regressors in the study, not dv's.

    Additionally on the topic, my further explorations revealed a few studies which used +1 approach. And also (a+b)/(a-b).

  5. #4
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: Transformation of panel count variables with multitude of zero values


    Just an update: It appears that +1 approach is the most preferable (at least in case of my data). The newly created ratio (based on counts >0), includes the majority of the scores (60-70%) as counts, whereas the remaining ones have decimals. Do you think it is a good idea to round the latter up?

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats