Ensuring correct model specification

#1
Hello!

My goal is to study the effect of new feature introduction by online platform on the ratings reported by platform users for a sample of companies. The newly introduced feature became available to all platform users. Each user could self-select to (a) use the feature, and (b) provide ratings. The rated companies have no control over that feature's use by platform users. In other words, after the introduction of the feature the companies could have ratings reported by both types of users, i.e., those who chose to use it and those who chose not to. The original data were collected at the user (review) level, which I aggregated at means by company id and year (N = 747, T = 13 [2007-2019], with gaps).

The new feature was introduced by platform in January of 2015; therefore, I created a dummy called feature where year >= 2015 = 1 or 0 otherwise.
And then estimate the feature's effect using the following model:

Outcome = feature + control1 + control2 + year_fe + company_fe + e

where control1 is the total number of yearly ratings and control2 is the total number of yearly ratings reported by users using the feature (=0 if year < 2015).

Does my approach seem appropriate to capture the effect of new feature introduction on the outcome? I would appreciate your feedback.
 

hlsmith

Not a robit
#2
I got a little lost in your description. So you made a knew feature available to companies to use online. Though the users, not companies, can opt to not use the feature. I follow this. What are the ratings? Are they about the feature or about the company?

Are there ratings for each user, everytime? Also, is there a risk for confounding by indication here, based on user opting in our out?
 
Last edited:
#4
@hlsmith I appreciate your feedback. Let me clarify. The platform provides user ratings of the companies. Any registered platform user can rate any company that has a registered account. In order to improve the quality of ratings, the platform introduced a new user-level feature that allowed users to earn points (rewards) for posting ratings (and reviews) of higher quality. Those users who decided to use the feature received a "special status" (displayed next to their names). Platform users self-selected if they wanted to use the feature or not. After the introduction of the feature, the companies' began to receive ratings from both "special status" users and regular ones.

Before the introduction of the feature, the ratings on the platform suffered from a so-called "J-shaped" distribution, a case when the majority of the ratings were highly positive ratings (i.e., had a value of 5). My assumption is that after the introduction of the feature, the average of user ratings for the companies would go down, as raters would be more careful and considerate in their evaluations (vs. simply giving 1 or 5).

Since my analysis is at the company level, I aggregate individual ratings at means. However, in this case I am loosing information on whether the ratings came from a regular user or one with "special status". To control for this, I account for (a) total number of yearly ratings, and (b) total number of yearly ratings reported by users using the feature. But the feature effect itself is captured basically with a point in time after which the feature became available on the platform (think of it as pre- and post-periods). Whether or not such approach is feasible -- I am not sure and that's where I am seeking your advise.
 

hlsmith

Not a robit
#6
Still a little confusing given the business intelligence context and my ignorance. Can you write out the question/model you think you would use. An issue I come across some times on this forum in Pre/Post-designs is that the people completing the survey in the Pre group aren't the exact same people completing the survey in the post group. So the difference may be in who and how much weight the people get in the pre/post-periods. Would this be an issue for you. Say the ratings go up, but it is also because who is rating isn't exactly the same between the periods.
 
#7
@hlsmith Okay, let me try to explain it this way. The closest comparison to the situation I am trying to study would be this. Google has a service called Google Reviews. People often use these, for instance, when searching for a restaurant. Here is an example of such reviews.

Look at the reviews -- some people, like the very top one Derin Rominger, have a special status -- so-called "Local Guide" (LG). Others, don't have such status at all. Google has introduced the Local Guide program about 3-4 years ago. Before that any user with a registered Gmail account could post a review with no special status. For their contributions LGs receive points and earn rewards -- there are multiple levels of LG. To get points from the platform, LGs must provide reviews of higher quality, do that more often, include pictures, etc.

If you scroll down to some older reviews (prior to the introduction of the LG program), you will notice that many evaluations had only rating, but no text review, had short reviews omitting any details, had no pictures, and overall reviews were somewhat not much informative. Recent reviews, however, especially from the LGs are far more informative and rich in terms of content.

Using Google as an example, my goal will be to examine the effect of this new feature introduced by the platform on the overall rating for a restaurant (sample of restaurants). I aggregate all individual ratings at the restaurant level (take a mean) -- this is my outcome of interest. I calculate (a) total number of ratings restaurant has, and (b) total number of ratings provided by LG users -- these are two of my controls. And then I create a dummy call feature = 1 if yearly restaurant rating is recorded after the LG program launch, or 0 if before that. Therefore, my model is:

Rating(it) = feature(i) + control1(it) + control2(it) + fixed_effects + e
 

hlsmith

Not a robit
#8
Ok that makes sense. Why do you say fixed effects above in model? will you have pre and
posts scores for all restaurants? If so, you are modeling posts not differences? How are score formatted, likert items. Also, are the LGs biased in any way? Meaning why would you expect ratings to go up after more vivid reviews, i would just think ratings may converge to the truth. Also, what stops the restuarant s from reading the vivid reviews and making real-time changes?
 
#9
Ok that makes sense. Why do you say fixed effects above in model? will you have pre and
posts scores for all restaurants?
Following the same Google example, people started rating restaurants many years ago, from 2008 I believe. Some restaurants mights have as low as 1 review, while others 2000+. For each of the reviews only year is available (no specific date); therefore, individual posts are captured at the yearly level only. The problem, however, is that each individual can rate a restaurant only once -- so there is no longitudinal panel, unless I aggregate data at the restaurant level.

If so, you are modeling posts not differences? How are score formatted, likert items.
Ratings are captured on a 1 through 5 point scale. Original data come at the post level. So, one way is to model individual posts and cluster standard errors on restaurant id. At the individual level, assuming cross-sectional format of the data, I can have some form of the difference-in-difference model:

Rating(i) = LG(i)*Time(i) + e, where Time = 1 if year >= 2015 (1)

However, I am not sure of how robust this kind of model is to capture causal effect of LG.

Another way is to aggregate ratings at the restaurant level (take a mean), but in this case I basically loose all LG-related information. Since, 99% of the restaurants have reviews both from LG and not, there is not clear treatment and control group. The only information available is the year when LG program started. Therefore, the model looks like:

Rating(it) = Time(i) + controls(it) + FE + e (2)

Is this model model better than the post-level one? I am really not sure...

Also, are the LGs biased in any way? Meaning why would you expect ratings to go up after more vivid reviews, i would just think ratings may converge to the truth.
In fact, I don't expect a positive effect, but rather negative -- exactly because I expect the ratings to converge to the truth, as you mentioned it.

Also, what stops the restaurants from reading the vivid reviews and making real-time changes?
That's one very good question. There is a line of research in my field that tries to address this question. I am trying to look at the review behavior from the perspective of the reputation mechanism implemented by the platform. In particular, the LG feature represents a reputation-building mechanism that is based on reciprocity between users and platform -- i.e., if LG users post reviews of higher quality that better reflect the true quality of the restaurant, then they get some rewards from the platform. In contrast, non-LG users that don't really care about rewards would act less considerate by simply providing a rating of 5 if they are happy or a rating of 1 if they are angry.
 

noetsi

Fortran must die
#10
I think, based on satisfaction research I have been involved in, people tend to give highly positive comments even when they are not satisfied as reflected by concurrent qualitative comments made. I don't have a suggestion how to fix that however (I think there is support for this view in the satisfaction literature).