Testing social media mentions for statistical significance

na9r

New Member
#1
I need some help understand how to test for significance between mentions of words that have both positive and negative sentiment on social media. For example, if I collect data on users opinions on a recently launched mobile phone, there are mentions of great battery life and poor battery life. Essentially for the phrase 'battery life' I get, say 500 positive mentions, and 600 negative mentions. Similarly for user experience I may have 1000 positive mentions, 1300 negative mentions and so on for other attributes. How do I test to understand if the negative mentions of battery life are significantly (statistically) more than positive? What I am unsure about it is that these are number of mentions not number of respondents who have said something positive or negative. So 1 person could have talked about user experience AND battery life. My current thinking is to first do a Shapiro Wilk test to determine normality, followed by a z test at 95% confidence level, but I am not sure if this is correct since it is based on number of mentions and not number of respondents. If someone can provide some guidance, it would be great!