How to address the problem of "censorship" in star ratings?

Hello dear forum members,

I am conducting a study exploring the relationship between online review-based ratings (min 1, max 5) and doctors' performance, as proxied by new patient referrals (count outcome, min 15, max 3,650,000). There is theory and literature support for that relationship, so let's leave this discussion out.

What concerns me more is the following. While digging the available data (N=1,848 observed from 2009 to 2013) I noticed some dramatic peculiarities related to both rating scores I have in consideration (i.e., and As suggested by anecdotal and recent research evidence, people tend to provide either positive or negative reviews. Thus, resulting in the censoring of the rating scores. My data supports this notion as well — the attached Qplot shows both RMD and Vitals have roughly 50% of the maximum value (i.e., 5) as a rating score. Additionally, note RMD has more “.5” responses (e.g., 3.5, 4.5, etc.), whereas Vitals is mode “discrete” (e.g., 3,4,etc.).

Having somewhat inconsistent significance of the ratings’ estimates, do you think it is worth exploring this issue and possibly "recode" the rating scores, based on some threshold value (e.g., dichotomize 1-3.9 = 0, 4-5 = 1; or 1-2.9 = 0, 3-4 = 1, 4.1-5 = 2 or any other appropriate). Or is this idea badly flawed?..

Note, if I were using either of the rating scores as a DV, then I would have used appropriate estimation methods for censored variables, such as interval or tobit regression, both of which imply specifying at least one threshold variable.

Thank you sincerely for the comments,
Last edited:
What I may think of additionally is recoding into: 5-star doctors, non-5-star doctors, and those not reviewed at all (the last would be beneficial to code missing values with zeros, thus making them meaningful). Yet this gives me three categories, rather than a scale.
Last edited:


Doesn't actually exist
from a psychometrics/test development perspective, you're facing "ceiling effects" in your rating scale.

if you're mostly interested in modelling the probabilities of response, it is not unusual to collapse categories in order to have a better spread of the ratings.

if you want to get fancier, you can work with censored regression/tobit models.