Search results

  1. staassis

    Multiple scores per subject

    Try GEE and random effects models. Stata has powerful implementation of both.
  2. staassis

    Best way to present result from large dataframe (26x1522) in an easily understandable manner

    You can run PCA on rows and not columns. This would allow you to better understand how the diversity values of nucleotides move together. As your study conjectures, different genetic groups may have different pictures. Then you can create a heat map for [K principal components] * [26 populations].
  3. staassis

    Why is Kendall's Tau always so high?

    I do not quite understand the data you have shown. The data you have attached have only one variable, not two. What is W? Kendall's tau must take values in [-1,1]. In fact, the absolute value of Kendall's tau tends to be lower than that of Spearman's rho or Pearson's correlation. It is hard to...
  4. staassis

    Is my approach correct for this binomial distribution question?

    The answer is [(9 choose 3) * (6 choose 2) + (9 choose 4) * (6 choose 1) + (9 choose 5) * 1] / (15 choose 5). This is not a case of binomial distribution because the databases are sampled without replacement.
  5. staassis

    Analyse a survey?

    Try Repeated Measures ANOVA. Should have a chance of working as long as each time point is represented with a sufficient number of people.
  6. staassis

    Assessing statistical significance of spread.

    Dan, much depends on the data you will end up collecting. Their type and size. Generically speaking, you may end up using a time series model which depends on the group ID. It is hard to say more at this point, unfortunately... Once you have collected the data, you can post them here and we will...
  7. staassis

    Sample size for creating a reference interval

    Without seeing the data, it is impossible to say what sample size would be sufficient for a prespecified accuracy. However, based on many data analyses I have performed over the last decade, 120 observations is unlikely to be sufficient for studying very low concentrations of some of the...
  8. staassis

    Is my approach correct for this binomial distribution question?

    The original question does not display.
  9. staassis

    Factor Analysis Question

    Seems like a programmatic issue. Perhaps, something related to in-memory variables. Try to reverse score in a separate Excel file and then load it into a fresh session of whatever software you are using.
  10. staassis

    Calculating a weighted mean/SD of x number of means/SDs?

    This statement is unclear: "Because of heterogeneity in patients, region X in one patient may have 40 separate data points, in another 90 points, in another 17 points." The whole thing may be a simple case for meta-analysis.
  11. staassis

    Is my hypothesis test correct?

    The hypotheses go the other way. "At least" means H0: ... >= ... And therefore H1: ... < ...
  12. staassis

    Is there a Mann Whitney test alternative when all variables are categorical?

    You can use chi-square test for independence if the expected frequency in each cell (for each combination of the categories) is >= 5. Separately, you can always us a randomization test, which is a variation of bootstrap.
  13. staassis

    Need help evaluating a PCA

    You do not. Factor loadings (if using the traditional definition) tell you how to represent the original variables in terms of factors. They do not tell you the reverse: how to calculate factors in terms of the original variables. The easiest approach is saving the factor scores (in SPSS, R...
  14. staassis


    Yes, you can. Choose the optimal penalty coefficient (λ) using leave-one-out cross-validation. It is likely to be substantial.
  15. staassis

    What statistical test for my data?

    You have to build a generalized linear model (GLM) of the form: Var3 ~ Var1 + Var2 GLM types to consider: Poisson regression, negative binomial regression, Poisson regression + zero-inflated component, negative binomial regression + zero-inflated component. You can choose the "optimal" GLM...
  16. staassis

    Is "time analysis" possible here?

    Yes, you can study the time effects using a panel-data model. The following framework must work in your case: DV_ij = α_i + β_1 * Time_j + β_2 * Time_j^2 + γ_1 * X_i1 + ... + γ_p * X_ip + ε, where DV_ij is the dependent variable for participant i and survey time j. You can consider two...
  17. staassis

    Historical research question

    Which photos? Nothing got attached... Aside from that, extensive attachments are bad style. You are asking for an advice, not for somebody to look deeply into your work and perform formal consulting... Also, please make your textual description shorter, straight to the point. Thank you.
  18. staassis

    What package to install

    Fair enough. Pretty great people go a long way towards something great in life.
  19. staassis

    Covid-19& presidential election analytics

    I am very sad, time after time seeing people who think they came up with an original and topical data analysis question: Covid-19.... There is too little data. Statistics is the science about what to do with data. One needs data.
  20. staassis

    What package to install

    @Dason, why is Iowa best? Any advantages over CA-1 or PA-1? Very curious.