Transformation of panel count variables with multitude of zero values

kiton

New Member
#1
Hello dear forum members!

For a panel of 1,700 physicians observed from 2009 to 2013, my data includes two variables -- number of favorable(fav) and unfavorable(unf) "keywords" counts, extracted from the online reviews (fav min-0, max 30; unf min-0, max 8). My intention is to use "fav/unf" ratio as an overall "sentiment" (derived from the reviews).

What concerns me is that both fav and unf encounter 40-50% of 0 values (mainly because many reviews are too short or missing at all for a given year), thus resulting in a multitude of missing values, as division by 0 is not defined.

To avoid this, so far I tried the following transformations of favorable and unfavorable: (1) adding +1; (2) recoding 0 into 1; (3) recoding 0 into 0.0001; (4) recoding all values into z-scores. And then divide one by another to create a ratio for the "sentiment".

Each transformation results in “some loss and gain”, with option (1) (i.e., adding +1) seemingly being “the most “ratio-like”. Yet, it also results in loss of information, since 10/10 becomes the same as no counts at all.

Have you ever come across such “transformation” issue? Or may be see a better solution? I would sincerely appreciate any comments and advises of yours.
 
Last edited:

kiton

New Member
#3
Thank you for the suggestion , rogojel. Yet, the two variables of interest are considered as regressors in the study, not dv's.

Additionally on the topic, my further explorations revealed a few studies which used +1 approach. And also (a+b)/(a-b).
 

kiton

New Member
#4
Just an update: It appears that +1 approach is the most preferable (at least in case of my data). The newly created ratio (based on counts >0), includes the majority of the scores (60-70%) as counts, whereas the remaining ones have decimals. Do you think it is a good idea to round the latter up?