# Thread: Complicated Research quesiton

1. ## Complicated Research quesiton

Im looking to do some research in the social science field and I have to do a statistical analysis of a complicated set of self report measures. I'd appreciate some feedback and some direction on how to do this. In addition I will have some staff to work with later on in the process but at this point I need at least a rough and feasible concept.

In this questionaire there are 8 overlapping catagories, each is evaluated by three related questions the first two are likert and the third catagorical. there are a number of demographic variables measure , and also 2 dependent variables, or theorized outcome variables which are validated surveys continuous discrete interval data.

the first question asks on a seven point scale - how important is activity A to you?
the second question ask on a seven point scale - have you been successful participating in activity A?

the third question asks for the main reason for unfulfillment.

I think my question is where do I start? I think regression is the way to go, but The conection between the three questions needs to be accounted for. That is I would think the difference score between questions one and two is very important. It could also be that the difference score between one and two is only important if certain catagories in question three are answered positively...and so on and so forth. Specifically the difference could be more important if the catagory was more important. that is I might want to multiply the difference by the weight of the first question.

Any advice or references would be greatly appreciated , hopefully I didn't brain storm to many problems for the forum but I have seen some great answers in the past so i'm confident it will be alright.

2. ## Re: Complicated Research quesiton

Likert is categorical

You need to decide first, before you think about a method, what do you want to know. What is the specific question you are interested in answering. For example, why do you want to know how the questions are connected or why there are differences in scores. What does that tell you? That will drive the method.

Connecting multiple questions suggests structured equation models to me, although that is far from simple.

3. ## The Following User Says Thank You to noetsi For This Useful Post:

CowboyBear (12-28-2016)

4. ## Re: Complicated Research quesiton

noetsi is absolutely right. Before even looking at the data you need to know what it is that you're trying to find out. If you end up just looking at lots of different relationships and then reporting the "interesting" (/statistically significant) ones, this is p-hacking.

What I'd suggest is one of two strategies:
1. Confirmatory approach: Put the data away. Talk to your supervisor/s and see if you can figure out some specific hypotheses you'd like to test with the data. Pre-register a plan for how you will test these hypotheses. Only then open the data up and try to answer them.
2. Exploratory approach: (Do this especially if you don't have hypotheses to test.) Read up on exploratory data analysis strategies. Conduct multiple analyses to look for interesting patterns and relationships and report all these analyses. Don't use statistical significance tests as a filter to decide which analyses to report. Make sure you clearly identify your final report as exploratory.

5. ## The Following 2 Users Say Thank You to CowboyBear For This Useful Post:

noetsi (12-29-2016), rogojel (12-29-2016)

6. ## Re: Complicated Research quesiton

I suspect that in practice p-hacking is the norm in both academics and business analysis. Which is a bit scary when you think about it.

You can look at the existing literature on this topic (if there is any) which is what I try to do. It helps if you have a good academic (like college) library to do so. Or if its a professional issue other in your work place may have information or theories.

7. ## Re: Complicated Research quesiton

I think that CBs point about NOT using a hypothesis test to decide what to report after the expliratory data analysis is more subtle than p-hacking. If you spot an unusual pattern in the data and do a significance test on it it will probably have a low p-value without any manipulation - but it will still be a fake. This is why ennouncing the hypothesis first and looking at the data second ( or finding a pattern in the data and collecting new data to confirm the hypothesis) is so important.

regards

8. ## Re: Complicated Research quesiton

That assumes you have a theory to start with. Commonly there is little theory behind what I do. This is an example of what causes confusion when I hear about things like phacking. It comes from a group of elite professors (the editor is at the Wharton school, it does not get much better than that). Its advice for time series (econometrics).

"But greater understanding and better forecasts will probably come from finding new data sets, not from using new techniques on tired old data sets. Consider the widest range of causal variables. Do not expect to use them all in the final forecasting model."

I think that is the norm not the exception (although its unknowable of course since its not reported often). This may be especially true in time series forecasting, which is often a theoretical.

9. ## Re: Complicated Research quesiton

Originally Posted by noetsi
That assumes you have a theory to start with. Commonly there is little theory behind what I do.
That's fine, the point is just that if you're doing exploratory work then significance tests probably aren't a suitable tool to be using. (Specifically if you use significance tests to decide what to report or focus on, you'll end up reporting biased estimates). When people complain about p-hacking, they're not saying don't do exploratory work; they're saying don't hunt around for significant effects and then dress it up as a confirmatory (hypothesis testing) exercise.

This may be especially true in time series forecasting, which is often a theoretical.
Yep, but fortunately there are lots of tools other than significance tests to guide atheoretical model selection, many of which are used heavily in time series forecasting. E.g., AIC, BIC, Bayesian variable selection, cross-validation, lasso, etc etc etc. Time series forecasting is a neat simple case for exploratory work actually, since at least when you're doing forecasting you know what it is that you want to achieve (accurate forecasts of a specific variable). This is different to a case where a researcher is staring at a giant dataset of variables and going "hmm, what would I like to find out?"

10. ## Re: Complicated Research quesiton

I used a quick example to show colleagues why using significance testing in exploratory analysis is a bad idea. I threw two dice for a while and marked the points on a 6x6 grid. Did this until I got three markings in one square then I asked the team to calculate the probability of getting 3 marks in that particular square (1/6*6*6 - definitely highly significant. Then asked them if they would bet, that the next series of throws will result in the same pattern.

regards

11. ## Re: Complicated Research quesiton

Originally Posted by CowboyBear
That's fine, the point is just that if you're doing exploratory work then significance tests probably aren't a suitable tool to be using. (Specifically if you use significance tests to decide what to report or focus on, you'll end up reporting biased estimates). When people complain about p-hacking, they're not saying don't do exploratory work; they're saying don't hunt around for significant effects and then dress it up as a confirmatory (hypothesis testing) exercise.

Yep, but fortunately there are lots of tools other than significance tests to guide atheoretical model selection, many of which are used heavily in time series forecasting. E.g., AIC, BIC, Bayesian variable selection, cross-validation, lasso, etc etc etc. Time series forecasting is a neat simple case for exploratory work actually, since at least when you're doing forecasting you know what it is that you want to achieve (accurate forecasts of a specific variable). This is different to a case where a researcher is staring at a giant dataset of variables and going "hmm, what would I like to find out?"
Commonly I use test against a hold out data set. That is I leave out the last year and predict it with various models using MAPE to determine which model predicts the hold out data. There are a variety of problems with this (including structural breaks and losing data to make predictions with) but this comes highly recommended. I use AIC in ARIMA, but have not in this type of model.

Personally I don't consider test of statistical significance that important in what I do, because I commonly have whole populations with thousands of data points. Well I report it, but its the effect size that is really the most important. When you have thousands of cases lots of things become significant and its arguable that statistical significance is not the way to go anyhow with whole populations. You know you have the effect size, because you have the population.

Then the real question becomes if you should think of this as a sample of other unknown populations. I have not for the most part.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts