# T-test help with determining whether its 1 or 2 tailed?

#### kenshi64

##### New Member
Hi, I need to perform a t-test to determine the relationship between US govt spending on Police and the number of Rapes annually.

So I have 53 data points (53 years each variable) Firstly, can I use a t-test for this data?

My first problem is that I don't get whether I should call this a one tailed or a two tailed t-test because I don't understand the difference between the two, I"m really bad at math so please bear with me I'd be glad if you tell me what to do, thanks in advance!

Following that, my other problem is labeling the cells in the contingency table. If I want to test the independence of the the US Govt spending on Police and the number of Rapes Annually, what would you suggest I name the columns and rows of the contingency table? (Basically names for the nominal variables/ the categories data can be classified into.)
Earlier I thought of Increase, Decrease and No change for Row(Govt. spending) and Columns(No. of rapes), but that gave me two problems:
The expected column totals weren't the same as the observed column totals
There were more expected values than observed values(observed values had some 0s)
What do I do? :S
Cheers
Kenshi

P.S. What other statistical tests can I use to test the relationship between the variables? ( I'd be immeasurable grateful if you could tell me this!! )

Last edited:
P

#### parsec2011

##### Guest
Hi Kenshi,

Here are my ideas:

For an explanation of one or two tailed tests check the video (1).

The relationship between two variables can be examined using the Pearson's r correlation coefficient, which is more commonly known as correlation coefficient. It is uncommon for teachers to ask for significance tests of correlation coefficients, but if you ever need it, here is a description of significant testing for correlations (3). Briefly put, Pearson's r helps you to examine the extent of association between two variables. A correlation coefficient r=0 indicates there is no relationship whatsoever, while r=1 is a very strong relationship. Check page 95 in (2). It includes a brief example of the way in which you use correlation coefficients in a social science context.

You tube, khanacademy.org provide tutorials about coefficients of correlations; how to calculate them, either manually, or using excel, R-statistics, and others.
You may also want to look into linear regression analysis. It is easier than it sounds, but you will need to dedicate time and effort to get results, especially if you are preparing a thesis research proposal.

Commonly, you put your independent variable (x) as a column in a contingency table, and your dependent variable (y) on the left-most row. In my opinion, your independent variable (x) is expenditure on police security and your dependent variable (y), the total victims of rape in a certain time frame.

All the best

Ramon
-------------

References

(3) http://www.une.edu.au/WebStat/unit_materials/c6_common_statistical_tests/test_signif_pearson.html

Also useful:

http://www.voxeu.org/index.php?q=node/452
http://whichtest.info/index.htm

#### kenshi64

##### New Member
Hi Kenshi,

Here are my ideas:

For an explanation of one or two tailed tests check the video (1).

The relationship between two variables can be examined using the Pearson's r correlation coefficient, which is more commonly known as correlation coefficient. It is uncommon for teachers to ask for significance tests of correlation coefficients, but if you ever need it, here is a description of significant testing for correlations (3). Briefly put, Pearson's r helps you to examine the extent of association between two variables. A correlation coefficient r=0 indicates there is no relationship whatsoever, while r=1 is a very strong relationship. Check page 95 in (2). It includes a brief example of the way in which you use correlation coefficients in a social science context.

You tube, khanacademy.org provide tutorials about coefficients of correlations; how to calculate them, either manually, or using excel, R-statistics, and others.
You may also want to look into linear regression analysis. It is easier than it sounds, but you will need to dedicate time and effort to get results, especially if you are preparing a thesis research proposal.

Commonly, you put your independent variable (x) as a column in a contingency table, and your dependent variable (y) on the left-most row. In my opinion, your independent variable (x) is expenditure on police security and your dependent variable (y), the total victims of rape in a certain time frame.

All the best

Ramon
-------------

References

(3) http://www.une.edu.au/WebStat/unit_materials/c6_common_statistical_tests/test_signif_pearson.html

Also useful:

http://www.voxeu.org/index.php?q=node/452
http://whichtest.info/index.htm
Thanks Ramon! But you seem to have misunderstood the chi- square part, could you re-read it and tell me if you still don't understand it? Well I'll rephrase anyway. In a standard 2 x 2 contingency table the data is grouped (as with all of Chi square data) into rather boolean (yes/no) or sometimes categorical data like (10-20, 30-40) But before I start counting I need to find chi square appropriate names for these cells.

P

#### parsec2011

##### Guest
Hi Kenshi,

My advice is to look for research papers or statistics books that examine a similar topic to the one you are studying, and see if any of the contingency tables give you some ideas. You can carry out a search with keywords on google (and google pictures), googlescholar, googlebooks and you will surely find something.

Good luck

Ramon

#### Dason

Regression would probably be more interesting than just correlation. With a single variable they're looking at almost the same thing but you get some interesting information with regression. You would also probably want to account for population size as well. I would say an analysis that doesn't account for population size in some way would be very flawed.

P

#### parsec2011

##### Guest
I thought of sharing this link with you.

The book is really good for social science research, it has helped me a lot in the past three years.

It also includes some table that might help you to get some ideas.

However, I agree with Dason, regression, and particularly ordinary least squares would be more suitable, especially because you have time series. Correlation would work for a given point in time, but as you introduce time series, it becomes futile.

http://books.google.nl/books?id=Riejj57HoRAC&pg=PA342&lpg=PA342&dq="What+other+factors+might+be+related+to+the+status+of+women?"&source=bl&ots=p9E92QUKzb&sig=jn2G546kzBqxfcjEzywmFuGrhVY&hl=nl&sa=X&ei=S_tIT52kM4qbOvXjqesN&sqi=2&ved=0CCAQ6AEwAA#v=onepage&q="What other factors might be related to the status of women%3F"&f=false

#### kenshi64

##### New Member
Regression would probably be more interesting than just correlation. With a single variable they're looking at almost the same thing but you get some interesting information with regression. You would also probably want to account for population size as well. I would say an analysis that doesn't account for population size in some way would be very flawed.
Okay perhaps I will account for time in a graph, thanks for your advice! The problem is that regression is apparently a wasteful process if your r-square is a small value like mine, thus I had to shelf that idea. Could you help me with OLS, I take really basic maths at school and so I can't find out how to do it on the internet because I don't understand squat!

I thought of sharing this link with you.

The book is really good for social science research, it has helped me a lot in the past three years.

It also includes some table that might help you to get some ideas.

However, I agree with Dason, regression, and particularly ordinary least squares would be more suitable, especially because you have time series. Correlation would work for a given point in time, but as you introduce time series, it becomes futile.

http://books.google.nl/books?id=Riejj57HoRAC&pg=PA342&lpg=PA342&dq="What+other+factors+might+be+related+to+the+status+of+women?"&source=bl&ots=p9E92QUKzb&sig=jn2G546kzBqxfcjEzywmFuGrhVY&hl=nl&sa=X&ei=S_tIT52kM4qbOvXjqesN&sqi=2&ved=0CCAQ6AEwAA#v=onepage&q="What other factors might be related to the status of women%3F"&f=false
I'm going through the book right now, but can you help me with OLS? I'd really appreciate that!

P

#### parsec2011

##### Guest
Hi,

Please consider that linear regression and OLS is not simple Math, they are common topics in secondary-high school statistics courses. If linear regression isn't part of the topics covered in your course, it probably isn't how you are expected to solve the problem. You may want to check with your tutor/teacher.

There are online videos that show you how to carry out linear regression and OLS.
I recommend you to look at khanacademy.org, utunes (if you are using itunes), or go to youtube, type ordinary least squares. There are many statistics videos that are really good to understand this subject. My favorite are in Itunes, (actually itunes University, which is an applet included in Itunes).

When I started my postgraduate studies, I used Excel, SPSS (university gave and sponsored it), and R statistics software (freeware) to solve all sorts of statistical problems. I found that the R statistics tutorials on youtube are very useful.

Good luck,

Ramon

#### CB

##### Super Moderator
The problem is that regression is apparently a wasteful process if your r-square is a small value like mine, thus I had to shelf that idea.
Are you sure about this? One of the big benefits of regression is that it can provide information that correlation doesn't. Which of these pieces of information seems more useful to you?

• Knowing how many fewer/more rapes one would expect as a result of (say) $10000 greater annual spending on police • Knowing the percentage of variance in year-to-year total of rapes that can be explained by variation in annual spending on police I'd pick option 1 [edited]! Regression can tell you both of these, but correlation and r-squared only tell you the latter. You might find that the r-squared is low, even though a moderate increase in spending does actually have a sizeable impact on the actual number of expected rapes (or maybe it doesn't - can't tell without looking). Thom Baguley uses a similar scenario to argue against overuse of correlation coefficients in psychology: http://nottinghamtrent.academia.edu/ThomBaguley/Papers/138898/When_correlations_go_bad_ However, I agree with Dason, regression, and particularly ordinary least squares would be more suitable, especially because you have time series. Hmm. The fact that the OP is working with time series data is a reason to be cautious about using OLS estimation, if anything. Using time series data may mean that you end up violating the independence-of-errors assumption of regression using OLS estimation (same applies to correlation though). Time series analysis is a complex topic so it's hard to work out where to steer the OP without knowing more about the project (is it a class assignment or a proper independent research project?) There might be an appropriate alternative way to estimate the regression model. The other issue here is that number of rapes in a given year is a count variable, not a continuous variable. The fact that you (OP) are aggregating by year is helpful though - with a relatively large number of rapes occurring in each year, the distribution of rapes by year might start to approximate a continuous distribution such as the normal reasonably well. #### kenshi64 ##### New Member Wow Thanks for the help, I'll look into r-statistics right now. Say does Microsoft excel's PEARSON function use the normal formula? Because of my values being huge, working is a ***** and I just can get the value they got. That's probably the error. I'll steer clear of OLS, it evades me. #### kenshi64 ##### New Member Are you sure about this? One of the big benefits of regression is that it can provide information that correlation doesn't. Which of these pieces of information seems more useful to you? • Knowing how many fewer/more rapes one would expect as a result of (say)$10000 greater annual spending on police
• Knowing the percentage of variance in year-to-year total of rapes that can be explained by variation in annual spending on police

I'd pick option 1 [edited]! Regression can tell you both of these, but correlation and r-squared only tell you the latter. You might find that the r-squared is low, even though a moderate increase in spending does actually have a sizeable impact on the actual number of expected rapes (or maybe it doesn't - can't tell without looking).

Thom Baguley uses a similar scenario to argue against overuse of correlation coefficients in psychology: http://nottinghamtrent.academia.edu/ThomBaguley/Papers/138898/When_correlations_go_bad_

Hmm. The fact that the OP is working with time series data is a reason to be cautious about using OLS estimation, if anything. Using time series data may mean that you end up violating the independence-of-errors assumption of regression using OLS estimation (same applies to correlation though). Time series analysis is a complex topic so it's hard to work out where to steer the OP without knowing more about the project (is it a class assignment or a proper independent research project?) There might be an appropriate alternative way to estimate the regression model.

The other issue here is that number of rapes in a given year is a count variable, not a continuous variable. The fact that you (OP) are aggregating by year is helpful though - with a relatively large number of rapes occurring in each year, the distribution of rapes by year might start to approximate a continuous distribution such as the normal reasonably well.
Yet again, wow! I'd Thank you twice if I could! Okay first off this is like a project but not any ordinary one, this is assessed for the board exams/ final marks (12th Grade) and I haven't performed enough mathematical calculations, only PCC! :'( Performing mathematical calculations carries 5 points at this rate I doubt I'll get it because I'm stumped at chi square and have got no help with it! :O
I quote the examiner's comments 'It would have been wiser to have found the Correlation Coefficient before finding the equation of the regression line, as a weak correlation would make that data irrelevant. So that's the spot I've been put in, though I largely appreciate the difference in opinion I can't take a risk right now! : /
Could you help me with the chi square contingency table? Hope to hear from you. Thanks!