Hi Guys,

Just want to start by saying great forums, some really useful stuff on here! I'm basically a statistics novice, though i did do a few subjects at uni many moons ago, so forgive me if I ask some stupid questions. And apologies if this isn't in the right area.

I'm working on a data visualisation process where I'd like to look at 'joke related' search terms and compare them across multiple areas. Now I know there is google trends, which shows relative search volumes for comparison. However, I find it flawed.

Firstly, there is a large threshold for reporting, which means you can't see data if it is below the threshold. Secondly, I wanted to look at clusters of 20-30 aggregated search terms to create a 'topic' and trends does not allow this. Thirdly, google trends data is normalised based on number of overall searches, which is fairly robust, however problematic for my uses. This is because, amongst other things, I wanted to look race specific queries and for states with large afro-american populations (30%) they simply wouldn't be searching for those things, therefore skewing the results.

So I came up with my own strategy. Taking a cluster of search terms all based around a similar topic and adding them together. Dividing this figure by an adjusted population figure (state population X % internet access) to get a per capita. One downfall is it doesn't take into account the fact that some states may search more on average than others.

Using my method Wyoming, North Dakota, South Dakota and Montana are the highest per capita for 'sexist jokes', however running this through Google Trends it shows they are the lowest. In fact it doesn't show data at all as they are below the threshold.

So In conclusion, is my methodology OK or rubbish? What could I do to improve it for more accurate results?

Thanks a bunch!