Statistics course and homework discussion. Elementary statistics.
Multiple Imputations in a Dataset
Sun, 27 Jul 2014 12:18:52 GMTDear all,

I'm a student working on a dataset based on a cross-sectional survey. The dataset includes around 200participants. The main analysis will involve a linear regression involving 3 predictors and an outcome continuous variable. Unfortunately, the data I have was collected over a long period of time and there was a systematic fault in data collection: the outcome variable in my analysis was not collected for a subset within my dataset. This subset is 40 participants. As such, my data isn't missing at random. Also, the missing subset is around 20% of my whole sample!

For such a huge percentage of missing data, I was wondering if multiple imputations can be done? Also, what would be a recommended number of imputations to make? Is there a specific reference or book that you recommend.

Thank you
Predictive model needed - bespoke or off the shelf software ?
Sat, 26 Jul 2014 17:34:22 GMTGood day all,
I am a complete novice with stats and zero experience but I do have a stats problem regarding data I have in excel and seek some...Good day all,

I am a complete novice with stats and zero experience but I do have a stats problem regarding data I have in excel and seek some expert opinion so a big thanks for any replies. I have done enough reading to know a predictive model based on time series is what I need - finding a good answer (unbiased) has been surprisingly hard.
So in general would a bespoke model be better than one of the popular software packages? I keep seeing 95% confidence value in various places but can it be done ?
Again your opinions are appreciated.
Apologies, I very soon realized I did not include anything about the data and that said I do not know the correct statistical terminology anyway. It is not very complex so please consider it an "in general" type of question.
]]>StatisticsMrMarkBhttp://www.talkstats.com/showthread.php/56947-Predictive-model-needed-bespoke-or-off-the-shelf-softwareComparing two large categorical data sets with low counts
Sat, 26 Jul 2014 16:53:52 GMTI'm studying a biological feature which can be classified into 2^20 categories. These categories can be thought as a dictionary of a fixed-length words made up from an alphabet, to say, 20-letter words from a binary alphabet. I also have two samples consisting of 3 million and 1 million entries. The samples are 1D arrays of words. Actually, only around 30 000 unique words were observed in one sample, and even less in the other. Thus, a lot of possible words have zero frequencies and a lot of observed words have quite low ones (thousands of words appear only once in the largest sample).

What I would like to do is to tell whether the samples are drawn from a different populations. To put this another way, does the word usage differs in the populations from which samples were drawn? One obvious statistic is Pearson's chi^2, which is, to my knowledge, not applicable here because of low counts.

Any advice or literature references would be appreciated.
Any advice or literature references would be appreciated.