Please help? Complete novice

#1
Please can someone help me with my analysis of data in my Masters dissertation?

I have conducted a quantitative self-questionnaire in my organisation, distributed to 12-branch managers & 18-sales managers using an identical questionnaire to gain an internal perception and external perception of competitive strategy. I now have 2 groups of data from the same questions with varying mean scores. How do i analyse the significance between groups? Is this data paired or independent? I'm a complete novice to stats and find the info on the NET very confusing.

Any help guys be greatly appreciated

Thanks Lee
 

hlsmith

Omega Contributor
#2
So you have mean scores for two groups (branch and sales managers). You will likely perform either a t-test (non-paired data) or a Wilcoxon Ranked Sums (nonparametric test). Read up about both and see which may be a better fit.
 
#3
Thank you for your help and speecy response, my main problem was understanding if my data is independent or not, considering my data is gathered from the same organisation. I will look at both you suggested thanks again,
 
#4
I have decided to use a two-tailed t-test and the result is a p value of 0.25 which i assume i can now state that the null hypothesis is accepted and not statistically signficant, does that sound right? Also can anyone advise if i break my questionnaire down by the 6 different sections i have in my findings should i calculate the p value for each section or leave it as an overall total value?
 
#5
So you have mean scores for two groups (branch and sales managers). You will likely perform either a t-test (non-paired data) or a Wilcoxon Ranked Sums (nonparametric test). Read up about both and see which may be a better fit.
Shouldn't he use "Mann-Whitney U" instead of Wilcoxon? I think Wilcoxon is for paired data.
 
#6
I have decided to use a two-tailed t-test and the result is a p value of 0.25 which i assume i can now state that the null hypothesis is accepted and not statistically signficant, does that sound right? Also can anyone advise if i break my questionnaire down by the 6 different sections i have in my findings should i calculate the p value for each section or leave it as an overall total value?
Please note that before "accepting" a null hypothesis due to P > 0.05, you should first check your test power. So you have to do power calculations in order to prove your non-significant P value valid and reliable. Otherwise, your P value is questionable.
 
#7
Now you've lost me sorry? Maybe im trying to over complicate my study, i have a simple questionnaire only 6 sections containing 5 questions each, the number of respondents is only 30 employees but 2 different groups of people. All i want to do is analysis the different opinions by mean score. Do i need to break it down by section and use t-test and power analysis to signfy a difference in my findings between the 2 groups? Sorry if im being simple here but this is all very new to me.
 

noetsi

Fortran must die
#8
A couple of points. First if your p is signficant you reject whatever null you created. What victorstc was pointing out if that if p is not signficant (most commonly this means p is greater than .05) you don't automatically assume that not rejecting the null is correct. You can only do this if the power of your test is high enough typically .8. You know if power is adequate by going to something like Gpower and doing a power test.

The answer to your question depends on what you are trying to do. If you are trying to show the two groups are signficantly different then you need to do a statistical test. Which test you do depends on your data. The first question you ask is if the two groups are made up of the same people (tested at different points of time) in which case you do a paired test, of whether the two groups are made up of different people in which case you do an indpendent t test (or non-parametric one, see below).

The second question is if you can calculate a mean from your questions. If you can, you do either a paired t test if your two samples are made up of the same exact people or a independent t test if they are different people. If you can not calculate a mean from your data, or if your data is badly non-normal you do a non-parametric test. The real problem here is that most survey question are ordinal (say, "How satisfied are you" with the response being a five point scale from most satisfied to least satisfied). Whether it is legitimate to create a mean here is hotly debated, but commonly done.

I will let wiser people tell you what to use if you can't calculate a mean.
 
#9
Thanks noetsi, i have established my groups are independent so unpaired, and using two-tailed t-test i now have a p value of 0.25. I will look at Gpower and do a power test and see what results i get.
How would i present my findings if i broke my findings down by questionnaire section? Would i calculate the mean score for that group of answers? The t-test will tell me overall if there is a statistical signficance but How do i explain the variance by section or by each answer?
 

noetsi

Fortran must die
#10
With 30 cases your power won't be high in all liklihood. And that is important, because while .25 would cause you not to reject the null it may be insignficant because you have too few cases and thus low power.

The answer to your question depends on who your audience is, and your own views of the data. If they are academics then you want to ground it more in the pertinant theory, both statistical and whatever topic your research is on. And you want to address more the limits we talked above about. If its a manager than you want to make it far simpler statistically and dwell less on the limits as long as you are satisfied with the results. Let me know what audience you are presenting to and I will make more suggestions.
 
#11
I will work the power analysis out then and see, my audience is academics as the study is for my masters dissertation. My study is testing our competitive strategy within the orgainsation i work. I believe that there is a signficant difference in opinions on strategy and how effective it is from the perception of managers based in the organisation to sales managers based in the field. My study had tobe confined to a region, instructions from the MD.
Thank you for yiur help greatly appreciated.
 

noetsi

Fortran must die
#12
I would worry about only 30 cases. Your committe might let you slide since its for a master's, but I would bring this up with your supervising professor. It will be less painful in the defense if he buys off on it.
 

noetsi

Fortran must die
#14
Then you are probably ok. If you don't think he will be anoyed (some professors can be pretty arrogant):p you might carefully raise the point of whether other members will be concerned over this. I had the immensely painful experience of my phd professor having no problems with my data, than being roasted by other professors in my defense who had other methodological assumptions. It was my fault for not checking in before hand.

But they may well be less harsh in a master's work.
 
#15
Silently I say to Leehud74 that I think that Noetsi is making this to complicated. If you would have a department presentation I would suggest presenting the mean for each question for the two departments (12 and 18 persons). With the mean, if you do a graphical presentation, I would include “error bars” for the standard error of the data.

I would not do an afterwards power calculation. It would be misleading and wrong. (I emphasis doing it afterwards.) So skip that!

If the 30 persons are all there is then you have investigated the whole population
And there is no more. There is no sampling problem! However frankly, there will probably remain some measurement error in the answers. I believe that half of the observed standard deviation is measurement error, so I think that presenting the standard error is enough.

Instead in a presentation and in the masters “thesis” I would suggest presenting subject matter problems, not the statistics. Statistics is interesting but not the whole world.
 
#16
I would not do an afterwards power calculation. It would be misleading and wrong. (I emphasis doing it afterwards.) So skip that!
I agree that power should be pre-determined. But think if it is determined after sampling, it might be better than not calculating it at all. Besides, if he sees that the power is low, he can use that power calculation for estimating a larger sample size. Don't you agree? :)
 
#17
No, I don’t agree.

To calculate power afterwards is meaningless. Since the test in this case was not significant the afterwards calculated power will be low. There is a one to one correspondence between afterwards power and significance tests.

Power needs to be calculated beforehand on new hypothetical data.

So skip the power calculation.
(And don’t make it to complicated for Leehud74. There is no gain in that. )
 

Jake

Cookie Scientist
#18
I read a statistician who had the following to say about power (paraphrased). First, has the study already been conducted? If yes, and you found a nonsignificant result, then power = 0. If yes, and you found a significant result, then power = 1.
 
#19
I read a statistician who had the following to say about power (paraphrased). First, has the study already been conducted? If yes, and you found a nonsignificant result, then power = 0. If yes, and you found a significant result, then power = 1.
I don’t know how to interpret the above statement: as agreement?, irony?, joy of formulation? mockery? or disagreement?


I can’t dig up the paper I read to back up the formulation I did. But from this
link I copy the following formulation:

“Here are two very wrong things that people try to do with my software:
Retrospective power (a.k.a. observed power, post hoc power). You've got the data, did the analysis, and did not achieve "significance." So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn't powerful enough -- that's why the result isn't significant. Power calculations are useful for design, not analysis.”

To say it again: it was not significant, with this data, of course it had low power.

Here post hoc power was discussed. But suppose if someone had done a power calculation beforehand and found out that it would be enough with 26 interview persons – out 30 in total. Is it anybody who believes that they would have just interviewed 26 persons? Of course they went for all the 30. In such a case the question is more do-or-don’t-do the investigation.

But is the power very relevant here? No! Not in such a study. But “margin of error” is relevant and I would prefer to report standard error (that is: standard deviation divided by square root of n.)

Maybe the must important thing with such a study is that the staff gets the chance to express themselves anonymously.

@Leehud74. I do hope that this thread – your thread – will not be kidnapped for other discussions. I just did not want to make things unnecessarily complicated.



Yes, it was an interesting formulation, wasn’t it?
 
#20
I can’t dig up the paper I read to back up the formulation I did. But from this
link I copy the following formulation:

“Here are two very wrong things that people try to do with my software:
Retrospective power (a.k.a. observed power, post hoc power). You've got the data, did the analysis, and did not achieve "significance." So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn't powerful enough -- that's why the result isn't significant. Power calculations are useful for design, not analysis.”

To say it again: it was not significant, with this data, of course it had low power.
Thanks dear Greta :)

However, the question might be "low power as to which extent"?

I think you agree that a power > 0.8 is accepted as appropriate? If so, I think you agree that there can be studies that have post-hoc powers higher than 0.8 or even 0.9, and still the test result becomes nonsignificant. I personally have done some of them, so I'm sure about such a possibility. :)

OK, in such cases, the power is not adequate to detect the difference as you kindly quoted, but it is acceptable as adequate and can't be considered "low" (because it is not less than 0.8).

So my point is that a test which gives a P value = 0.08 with a post-hoc power = 0.15 is very unreliable because that 0.08 P value seems to be a false negative rather than something important. But a test which gives the same P value = 0.08, but now at a 97% post-hoc power is extremely likely to be valid and generalizable because this P=0.08 is very unlikely a false negative, but it is very likely indicative of the lack of significant differences in the true population (if sampling was done correctly of course). My example bears with extremes, but it can be generalized to milder cases too.

I think the case of Leehud74 is the former, in which too low power accounted for nonsignificance, and here by "too low power" I don't mean a "too low" for detecting a significant difference, from which every nonsignificant test is suffering (as kindly quoted by you). I agree that if its power was high enough to detect something, it would finally give some significant results. However, by "too low power" I mean powers considerably below 0.8 (even if they are still low for giving a positive result [regardless of being true or false positive]).

So his P value = 0.25 seemed a false negative in the first place (when the P is greater than 0.05 and the beta is smaller than 0.8).