I'm new to this forum and have a very basic question.

How do we establish causality after establishing association?

Do we have any methods which are generally accepted across the field and how reliable are they?

If you could point me in the right direction towards either some kind of MOOC or a book it would be of great help to me.

Thanks ]]>

I'm quite concerned about the applicability of some non-parametric tests in the following example because it is adapted from a defended doctoral dissertation, yet there are (as I understand) technical flaws. My questions follow the example.

It is a within-subject design with binary count data. A sample of 30 children are observed in a task under both control and experimental conditions. Under each condition, whether a subject makes a correct response in each of the 10 trials is recorded, and the number of correct responses (hits) for each child is calculated.

The research questions are: Q1a. whether the children make more hits under experimental condition than chance level, as most of them make at least 6 hits, with a total of, like, 240/300 trials "hit"; Q1b. similarly, whether they make fewer hits under control condition than chance level, with 120/300 trials "hit"; Q2. for each child, whether (s)he is more likely to "hit" under experimental condition than under control, for example, Subject 1 making 8 hits/2 misses under exp., and 5 hits/5 misses under control.

It seems that for Q1a and 1b, a parametric (and better) way is to calculate each one's accuracy and compare the mean accuracy to 50% with

And for Q2, though the instinctive choice is Fisher's exact test, a similar concern arises because the frequencies result from multiple observations on the same subject.

The author not only did the tests underlined above, but also calculated the mean number of hits across all subjects under each condition (240/30=8 and 120/30=4) and did two additional binomial tests to see if "the average child makes more/fewer hits than at chance level under each condition". This does not seem to fit into what I have learned at stats classes.

And my questions are as follows. The first two are about research questions 1a and 1b.

1. Using accuracy (%) as a dependent variable, usually how many trials do we need for each subject so we can treat accuracy as a continuous variable?

2. If there were really too few trials, and non-parametric tests have to be used, which is the best way to answer RQs 1a and 1b as in my case, with multiple observations for each subject? I did a little research and communication and it seems that Somers' D, Huber variance and Generalized Linear Mixed Modeling or GLMM are appropriate, with the latter two said to be better. I did see a study analysing a dataset of similar structure with GLMM, so is there any occasion when Somer's D is preferred?

Then, 3. at the individual level, what's the best way to answer RQ2? If Fisher's exact test is not acceptable here, the only way seems to calculate Somer's D for each subject.

Finally, 4. when use chi-square tests for frequency data, is it definitely unreasonable to fill averaged frequencies rather than raw observations into the cells, as the author did for "the average child"?

Thanks in advance for your help.

Meng ]]>

Rather than taking the actual value in each column, I have replaced it with a rank. So, given there are 100 footballers in the data set, the player with the highest value in the Accurate Passes/90 mins columns is assigned the number 1, the second highest value is assigned 2, third highest player 3, all the way to 100.

For each position, I have taken a subset of the variables. For example, for defenders, I'm only interested in how many tackles and clearances they've made and not how many shots they've had.

My question is to do with how to weight these variables, as some are more important than others when evaluating a player. For example, for strikers, although I am interested in how many passes they make, the number of goals they've scored is much more important and should be weighted accordingly.

For each position, I have a player that I know is the best in that position and I would like to assign appropriate weightings (w) to the relevant variables (V) to help achieve this. Each player will be assigned a number (let's call it x) which is the sum of the weighting times the variable rank. My aim is for x to be the lowest number for the best player in each position. So:

x = w1V1 + w2V2 + w3V3 + w4V4 + ....

where

0 << wi << 1

i = 1,2,3,4,...

It must also be noted that the weighting of each variable can change depending on which position is being examined. For example, when looking at strikers, the goals weighting will be much higher than when looking at midfielders. This is because, although goals can be used to evaluate how good both midfielders and strikers are, it is much more important for a striker to score goals than a midfielder. Therefore the goals variable will carry more weight when examining strikers compared to midfielders.

How would be the best way of working out the weightings? Help much appreciated! ]]>