Analyzing Tagged Content: Is a Multiple Regression Analysis the Right Approach?

#1
I’m working with a relatively small data set that consists of several hundred social media posts, key engagement metrics and up to 10 content “tags” that describe the image of each post. We leveraged Google Vision API along with a manual review to construct the tags. I’ve linked to an example of what we’re working with here (http://imgur.com/vcZkWi9).

What I’m trying to do: I would like to leverage a statistically valid methodology to identify which one or more (in combination) of tags tend to perform the best across the data set. It’s easy enough to look at an individual tag and calculate the mean of the KPI, but any suggestions on how to evaluate combinations of tags that yield high performance? It wouldn’t necessarily need to be all tags in combination, but could be 3 out of the 10 perform the best.

What approach would you recommend to understand what tags are most closely associated with the highest mean KPI score? I’ve been debating whether a multiple regression analysis is best, but looking for some insight on this.

I’ve had a tough time finding any other resources online so any help would be greatly appreciated!
 
#2
As far as I understood, KPI is your dependent variable -- one you are trying to predict, correct? If so, then note that it is of count type, i.e., it's a non-negative integer value. This implies using count models, such as Poisson or Negative Binomial regressions. Furthermore, your tags are basically words -- these you have to quantify somehow. Otherwise, I don't see a way to analyze them statistically. How many tags do you have? Possibly you could code them somehow, or at least use dummies (e.g., text: 1=yes, 0=no; cartoon: 1=yes, 0=no, etc.)

You can also look at some machine learning algorithms (e.g., singular value decomposition[SVD]) to work with text (tags) data, but that is a slightly different ball game.