# Thread: Trying to analyze aggregated data

1. ## Trying to analyze aggregated data

Hi, I would like to know how it would be possible to make a t-test with aggregated data on SPSS or R.
My dataset looks like this:

Region......Question.....%positive.......%negative......n
1..............15...............20...................45............2956
1..............15...............24...................38............12 459
1... etc
2
2
2

I am trying to verify if region 1 differs from region 2 in the %positive and %negative categories. SPSS and R consider that n=number of lines for each category instead of the number of people who answered the question.
Is there a way to make SPSS or R consider the real number instead?

Sarou

2. ## Re: Trying to analyze aggregated data

So %positive and %negative don't sum to 100%? What do you mean by "Is there a way to make SPSS or R consider the real number instead?"

3. ## Re: Trying to analyze aggregated data

There is also a %neutral but it is not relevant for my analysis.
I want the results to represent the total amount of people who answered so the standard deviation would be more representative for each group

4. ## Re: Trying to analyze aggregated data

So N is more or less a weight used to compress long data. i.e. if case a has a postive% of 52 and negative% of 25 and is in region 1 and so does case b then the row becomes:
Code:
``````region   positive   negative   weight
1          52          25            2``````
which is a frequency weight. In which case maybe https://cran.r-project.org/web/packa...reqweights.pdf I also believe just using something like:
Code:
``lm(positive ~ region, weight = N)``
will give you what you want but don't trust e on that

 Tweet