Hello dear forum members!

For a panel of 1,700 physicians observed from 2009 to 2013, my data includes two variables -- number of

What concerns me is that both fav and unf encounter 40-50% of 0 values (mainly because many reviews are too short or missing at all for a given year), thus resulting in a multitude of missing values, as division by 0 is not defined.

To avoid this, so far I tried the following transformations of

Each transformation results in “some loss and gain”, with option (1) (i.e., adding +1) seemingly being “the most “ratio-like”. Yet, it also results in loss of information, since 10/10 becomes the same as no counts at all.

Have you ever come across such “transformation” issue? Or may be see a better solution? I would sincerely appreciate any comments and advises of yours.

For a panel of 1,700 physicians observed from 2009 to 2013, my data includes two variables -- number of

*favorable*(fav) and*unfavorable*(unf) "keywords" counts, extracted from the online reviews (fav min-0, max 30; unf min-0, max 8). My intention is to use "fav/unf" ratio as an overall "sentiment" (derived from the reviews).What concerns me is that both fav and unf encounter 40-50% of 0 values (mainly because many reviews are too short or missing at all for a given year), thus resulting in a multitude of missing values, as division by 0 is not defined.

To avoid this, so far I tried the following transformations of

*favorable*and*unfavorable*: (1) adding +1; (2) recoding 0 into 1; (3) recoding 0 into 0.0001; (4) recoding all values into z-scores. And then divide one by another to create a ratio for the "sentiment".Each transformation results in “some loss and gain”, with option (1) (i.e., adding +1) seemingly being “the most “ratio-like”. Yet, it also results in loss of information, since 10/10 becomes the same as no counts at all.

Have you ever come across such “transformation” issue? Or may be see a better solution? I would sincerely appreciate any comments and advises of yours.

Last edited: