# Thread: Transformation of panel count variables with multitude of zero values

1. ## Transformation of panel count variables with multitude of zero values

Hello dear forum members!

For a panel of 1,700 physicians observed from 2009 to 2013, my data includes two variables -- number of favorable(fav) and unfavorable(unf) "keywords" counts, extracted from the online reviews (fav min-0, max 30; unf min-0, max 8). My intention is to use "fav/unf" ratio as an overall "sentiment" (derived from the reviews).

What concerns me is that both fav and unf encounter 40-50% of 0 values (mainly because many reviews are too short or missing at all for a given year), thus resulting in a multitude of missing values, as division by 0 is not defined.

To avoid this, so far I tried the following transformations of favorable and unfavorable: (1) adding +1; (2) recoding 0 into 1; (3) recoding 0 into 0.0001; (4) recoding all values into z-scores. And then divide one by another to create a ratio for the "sentiment".

Each transformation results in “some loss and gain”, with option (1) (i.e., adding +1) seemingly being “the most “ratio-like”. Yet, it also results in loss of information, since 10/10 becomes the same as no counts at all.

Have you ever come across such “transformation” issue? Or may be see a better solution? I would sincerely appreciate any comments and advises of yours.

2. ## Re: Transformation of panel count variables with multitude of zero values

hi,
maybe you could try to model your data using a zero inflated model and see if the model changes overt time?

https://en.m.wikipedia.org/wiki/Zero-inflated_model

regards

3. ## The Following User Says Thank You to rogojel For This Useful Post:

kiton (07-20-2015)

4. ## Re: Transformation of panel count variables with multitude of zero values

Thank you for the suggestion , rogojel. Yet, the two variables of interest are considered as regressors in the study, not dv's.

Additionally on the topic, my further explorations revealed a few studies which used +1 approach. And also (a+b)/(a-b).

5. ## Re: Transformation of panel count variables with multitude of zero values

Just an update: It appears that +1 approach is the most preferable (at least in case of my data). The newly created ratio (based on counts >0), includes the majority of the scores (60-70%) as counts, whereas the remaining ones have decimals. Do you think it is a good idea to round the latter up?

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts