  1. Miner

    Missing data satisfaction research

    While 75% is an excellent response rate, you still have the potential for non-response bias there as well. Regarding the other, it does make sense, but it is something that would vary depending on what you are studying. That example illustrates a worst case scenario, but there are other where...
  2. Miner

    Employee satisfaction database

    The other big problem is that there is no consistency in the satisfaction scale or in the questions asked across companies. This would have a different focus, but The American Customer Satisfaction Index is available by year and broken out by many sectors. You could match that up with public...
  3. Miner

    When is a difference between two percentages more likely to be statistically significant

    The power of a proportions test decreases when you are in the middle, so you have to compensate by increasing the sample size.
  4. Miner

    When is a difference between two percentages more likely to be statistically significant

    I believe there are also differences depending on whether you are in the middle (e.g., p ~ 0.5) versus the extremes (e.g., p ~ 0.1 or 0.9).
  5. Miner

    Kaplan-Meier usage for forecasting

    I recommend using a different approach. You are essentially trying to predict the probability of an event occurring at a point in the future. This is the basis of reliability analysis. I would use a survival analysis for arbitrary censored data. With the number of policies involved, you...
  6. Miner

    Why do experts continue to use stepwise regression?

    This is a good, easily read article from Minitab on the perils of stepwise and best subsets regression. They approached it empirically using a known model.
  7. Miner

    Help with understanding correlation ...

    r is an indicator of how strong the correlation is. Specifically, how strong the signal is relative to the noise. The "cutoff" for how good an r value needs to be depends on your needs and the purpose of your study. I work in industrial statistics, and an r of 0.5 would be of little practical...
  8. Miner

    Best way to run regression

    I don't know whether the following is technically correct, but I can tell you that it worked for me. We do an annual survey on a wide variety of measures, which use an ordinal scale of 1 - 10. I used multiple linear regression to determine the relationship between possible IV and an important...
  9. Miner

    Type II error textbook question

    Are you confusing the concept of Power with the Type II error (Beta)? Power = 1 - Beta
  10. Miner

    Determination of optimal number of clusters (stpping rule) with similarity measure (Jaccard coefficient)

    I use the method described in this post. You always need to make sure that your number of cluster makes practical sense. You may find value in clustering at more than one level. For example, if you were to perform a cluster analysis using data on vehicles, you might cluster at a high level and...
  11. Miner

    Discrete normal distributions

    Technically, you should be using the Poisson distribution instead of the Normal distribution. If an approximation is good enough versus an exact fit, you could use a quincunx approach. Quality people have used this for decades to demonstrate the concept of variation and the effect of making...
  12. Miner

    Suggestions for analysis of my experiment

    If I understand your design, this sounds like a repeated measures design, so a repeated measures MANOVA should be the appropriate analysis.
  13. Miner

    Design of Experiments with existing data

    One factor at a time experiments are inefficient and often unable to detect interactions.
  14. Miner

    Design of Experiments with existing data

    I recommend that you analyze this using regression.
  15. Miner

    About box plots...

    They are probably not providing what you think, but are actually providing something commonly used by survey companies where they typically will use 5 buckets and Top box/Bottom box scores.
  16. Miner

    p-value and alpha confusion

    Alpha is a threshold value that you set that defines the amount of risk that you are willing to accept that you will commit a Type 1 error. You then compare the p-value to this threshold to determine whether you will reject the null hypothesis. If the consequences of committing a Type 1 error...
  17. Miner

    Comparing groups - using ANOVA but unsure on various points, help greatly appreciated!

    You will probably have to use a General Linear Model instead of the standard ANOVA to include the control in the analysis. A different approach that would allow you to use a 2-way ANOVA would be to take the mean of the control sample for subject 1 and subtract that from all of the subject 1...
  18. Miner

    Bias type / sorting

    I am not certain that this applies, but there are two types of bias that impact the use of multiple choice questions in surveys. They are:
  19. Miner

    Finds best 7 best days in a row out of a set of data

    Calculate a moving average of length 7, then find the max moving average. These are the lot numbers: 14438438 14438441 14438445 14438458 14438466 14443462 14443486