+ Reply to Thread
Results 1 to 2 of 2

Thread: Analyzing Tagged Content: Is a Multiple Regression Analysis the Right Approach?

  1. #1

    Question Analyzing Tagged Content: Is a Multiple Regression Analysis the Right Approach?




    Im working with a relatively small data set that consists of several hundred social media posts, key engagement metrics and up to 10 content tags that describe the image of each post. We leveraged Google Vision API along with a manual review to construct the tags. Ive linked to an example of what were working with here (http://imgur.com/vcZkWi9).

    What Im trying to do: I would like to leverage a statistically valid methodology to identify which one or more (in combination) of tags tend to perform the best across the data set. Its easy enough to look at an individual tag and calculate the mean of the KPI, but any suggestions on how to evaluate combinations of tags that yield high performance? It wouldnt necessarily need to be all tags in combination, but could be 3 out of the 10 perform the best.

    What approach would you recommend to understand what tags are most closely associated with the highest mean KPI score? Ive been debating whether a multiple regression analysis is best, but looking for some insight on this.

  2. #2
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: Analyzing Tagged Content: Is a Multiple Regression Analysis the Right Approach?


    Quote Originally Posted by stupidpoeticjustice View Post
    What Im trying to do: I would like to leverage a statistically valid methodology
    Do you mean use a statistically valid methodology? c.f. Leverage

    to identify which one or more (in combination) of tags tend to perform the best across the data set. Its easy enough to look at an individual tag and calculate the mean of the KPI, but any suggestions on how to evaluate combinations of tags that yield high performance? It wouldnt necessarily need to be all tags in combination, but could be 3 out of the 10 perform the best.
    I'm not sure this is doable given your dataset size. If you're interested in combinations of up to 3 tags at once, you're looking at 10! / 3! (10 - 3)! = 120 different combinations, so 119 regression slopes to estimate. Given that you only have several hundred datapoints to work from, you won't be able to get remotely precise estimates (e.g., consider that each 3-way combination will only come up a tiny handful of times). If you really want to look at the combinations in this data-driven way you will need thousands of data points.
    Matt aka CB | twitter.com/matthewmatix

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats