Undergraduate Journal Reading

Dason

Ambassador to the humans
#1
Hi,

So I'm on my "media fast" but as you may have noticed I slipped a few times today. Anyways I'm allowing myself to go on today because I actually have a question for everyone here. There is an undergraduate statistics club here and I'm one of the graduate student liaisons to this club and I was thinking about putting together a journal reading group for them. Essentially I think it would be good for them to read some journal articles and read critically with a group of people before they're thrown into the fray either out in the real world or in further studies. I talked to the faculty advisor for the club and she thinks its a good idea but since she is fairly busy it's up to me to get everything organized. I haven't emailed the undergraduates to see what they think of it yet but I have a few ideas.

So first off: What do you think? Is this a good idea? Would you have participated in something like this? Have you participated in something like this and have any suggestions?

Second (and the part I'm most interested in): Any suggestions for good articles that are informative but at least fairly accessible to an undergraduate? I was thinking of starting with Benjamini and Hochberg's famous paper on controlling FDR. It's fairly readable and I don't think they've probably thought too much about controlling for error using something other than a family wise correction (if they've even been introduced to that yet).

Any comments or suggestions are extremely welcome.
 

spunky

Can't make spagetti
#2
Hi,

So I'm on my "media fast" but as you may have noticed I slipped a few times today.
welcome back! i kind of assumed that your post on that little derivation thread i replied to was your official comback to the internet so i'm not sure whether you're already like "here, here" or only "half here"...


What do you think? Is this a good idea? Would you have participated in something like this? Have you participated in something like this and have any suggestions?
i think it's a GREAT idea and, had i been able to, i would have certainly participated on a reading club like the one you're describing. contrary to the journals in the social sciences/humanities (where you one can pretty much read everything and get a pretty good sense of what the authors are trying to say by a combo of good common sense and wikipedia), i think the journals in math or stats are particularly inaccessible to anyone who hasnt done stuff beyond their regular 4yr undergrad degree. having someone like you (or more experienced students/faculty) guide them to learn about the notation, the structure on how its written,etc would be rather good for them. now i'm kinda feeling jealous, lol. i took once a directed readings course on rings, groups and more advanced linear algebra topics which focused mostly on what you described: we struggled with the reading, did research around it, brough questions to class and the prof would guide us in trying to make sense of what was being presented.




Second (and the part I'm most interested in): Any suggestions for good articles that are informative but at least fairly accessible to an undergraduate? I was thinking of starting with Benjamini and Hochberg's famous paper on controlling FDR. It's fairly readable and I don't think they've probably thought too much about controlling for error using something other than a family wise correction (if they've even been introduced to that yet).

Any comments or suggestions are extremely welcome.


these are kind of general suggestions but i'm not sure whether they apply to this situation or not. first, i'm very, very fond of the American Statistician as a journal because i think they make the extra effort to make things a little bit more accessible to people who may not have gone beyond a bachelor's degree in math/stats. although i suscribe to Biometrika or the Royal Statistical Society, i do have to say that most of the time things go right over me, even if the abstract or the title sounds really appealing. second... i'm not sure if you'd be willing to consider journals with a heavy emphasis on statistics (both theory and applications) but which may not be directly associated with any statistcal society or uni dept. for example, Psychometrika is the most quantitative-oriented journal i have found in psychology. to fully grasp its material one needs at the very, very least good foundations on linear algebra, multivariate calculus and some probability theory... of course, the more dense articles assume one is more familiar with these topics. perhaps you'd be able to find articles with the right ammount of statistical theory in it but still written with a somewhat more accesible language..
 

Dason

Ambassador to the humans
#3
welcome back! i kind of assumed that your post on that little derivation thread i replied to was your official comback to the internet so i'm not sure whether you're already like "here, here" or only "half here"...
Only half here. The times I replied to threads earlier today were accidents. I was checking my email on the universities terminal servers and my bookmarks hadn't been updated to remove the talk stats link and I just naturally clicked it and opened some threads. I couldn't resist replying. But I intend to resume the fast after tonight.

these are kind of general suggestions but i'm not sure whether they apply to this situation or not. first, i'm very, very fond of the American Statistician as a journal because i think they make the extra effort to make things a little bit more accessible to people who may not have gone beyond a bachelor's degree in math/stats. although i suscribe to Biometrika or the Royal Statistical Society, i do have to say that most of the time things go right over me, even if the abstract or the title sounds really appealing. second... i'm not sure if you'd be willing to consider journals with a heavy emphasis on statistics (both theory and applications) but which may not be directly associated with any statistcal society or uni dept. for example, Psychometrika is the most quantitative-oriented journal i have found in psychology. to fully grasp its material one needs at the very, very least good foundations on linear algebra, multivariate calculus and some probability theory... of course, the more dense articles assume one is more familiar with these topics. perhaps you'd be able to find articles with the right ammount of statistical theory in it but still written with a somewhat more accesible language..
I'm not opposed to using non-stats journals if they have interesting components to them. But I think it will be tailored mainly to whoever is interested in it so something from Psychometrika could slip into the reading list if there is interest in that material. I doubt there will be too many readings but I guess that's up to the undergrads. If they want to have a meeting every week that would be pretty intense but I'd be willing. My guess is that it'll probably be a once a month thing though.

It will be interesting to see how much interest there is and what they are interested in reading. I mentioned the Benjamini and Hochberg article because it's highly related to what I do so I feel very comfortable with that article in particular. I think it could bring up some good discussion and I think the general idea of controlling for a different type of error is simple enough but it is something that could be directly applied.
 

spunky

Can't make spagetti
#4
you're definitely right... false discovery rate is something i believe applies to anyone who does data anlysis. i personally wasnt familiar (until today) with the Benjamini - Hochberg method because i guess in this province of knowledge you can usually get away with a bonferroni kind of correction... do the undergrad stats courses in your uni tend to follow a theme of sorts or do you know if the undergrads are sort of trained in some special sub-areas of statistics? i'm just wondering because here in UBC (University of British Columbia) they make a lot of fuss for stats students to take biostatistcs courses or mathematical finance courses so maybe you can try and present articles that relate to areas of research or ares of interest students have...

although good move in planning to start with an article towards which pretty much everyone can relate to..
 

Dason

Ambassador to the humans
#5
you're definitely right... false discovery rate is something i believe applies to anyone who does data anlysis. i personally wasnt familiar (until today) with the Benjamini - Hochberg method because i guess in this province of knowledge you can usually get away with a bonferroni kind of correction...
Yeah a bonferroni kind of correction isn't the best route to go when you're doing ~ 50,000 tests and are mainly looking for genes to explore further. But on top of just learning about FDR I think it could open discussion on controlling error rates in general.

do the undergrad stats courses in your uni tend to follow a theme of sorts or do you know if the undergrads are sort of trained in some special sub-areas of statistics? i'm just wondering because here in UBC (University of British Columbia) they make a lot of fuss for stats students to take biostatistcs courses or mathematical finance courses so maybe you can try and present articles that relate to areas of research or ares of interest students have...

although good move in planning to start with an article towards which pretty much everyone can relate to..
It's sort of a mixed bag or at least the undergraduate program itself is. I don't know too much about the background of the students that make up the undergraduate club and what their interests are. Which makes it sort of interesting since I don't really know how much they already know.
 

Dason

Ambassador to the humans
#6
Thanks spunky for the feedback. I sent out the email and we'll see how much interest there is. Now I continue fasting. I might even extend my fast until Tuesday evening since I have an exam in my theory of linear models course Tuesday afternoon.
 

spunky

Can't make spagetti
#7
Yeah a bonferroni kind of correction isn't the best route to go when you're doing ~ 50,000 tests and are mainly looking for genes to explore further. But on top of just learning about FDR I think it could open discussion on controlling error rates in general.

i knooooow! and it's one of the reasons i've always wanted to work on something genetics-related. my advisor once told me that the way inference is done in social sciences makes no sense whatsoever when you're looking at thousands and thousands of genes for location... which made me wonder something like what you're mentioning here... how does one go about doing 50,000 t-tests (i mean i know it's not a t-test i'm just using it as an example), controlling for type-1 error and not running into infinitesimal alpha-values? i'm sure looking at the Benjamini-Hochberg method may help me out understand it...

Which makes it sort of interesting since I don't really know how much they already know.
well, then you made a great 1st step sending out that email.... so, do you have like access to all the undergrad's emails or something? you must be pretty high-up there in your dept :p theory of linear models? sounds fun... do you do coursework or are you mostly involved in research now?

(oh, and btw, if it's not too much of a trouble, would you mind sharing with us which readings you're planning on going through in your journal reading group? i would definitely be intersted in following along virtually...)
 

noetsi

No cake for spunky
#8
A key consideration is that statistical journals seemed focused on the methods and theory, and the substance of what they are being used for is often besides the point. Also they pay little attention to key practical points - like how to clean up the data.

So if they are going to be using this for anything but academics you might look at journals that use statistis rather than focused on statistics. Like Say JPART or Journal of Applied Pschology or AMJ.
 

Dason

Ambassador to the humans
#9
i knooooow! and it's one of the reasons i've always wanted to work on something genetics-related. my advisor once told me that the way inference is done in social sciences makes no sense whatsoever when you're looking at thousands and thousands of genes for location... which made me wonder something like what you're mentioning here... how does one go about doing 50,000 t-tests (i mean i know it's not a t-test i'm just using it as an example), controlling for type-1 error and not running into infinitesimal alpha-values? i'm sure looking at the Benjamini-Hochberg method may help me out understand it...
Looking into the procedure B-H propose will probably help you out there. But essentially the type-I error rate isn't the main thing of concern anymore. The interests lies in controlling the expected false discovery rate. So if I give you a list of 500 genes that seem to be significant if I use a method controlling expected FDR at 5% then on average only about 5% of the genes on my list will be genes that had a true null hypothesis. It can be shown that this controls the type-I error rate in a weak sense (if ALL the null hypothesis are true then controlling for FDR will also control the family wise error rate).

(oh, and btw, if it's not too much of a trouble, would you mind sharing with us which readings you're planning on going through in your journal reading group? i would definitely be intersted in following along virtually...)
Sure I can keep everyone updated.

A key consideration is that statistical journals seemed focused on the methods and theory, and the substance of what they are being used for is often besides the point. Also they pay little attention to key practical points - like how to clean up the data.

So if they are going to be using this for anything but academics you might look at journals that use statistis rather than focused on statistics. Like Say JPART or Journal of Applied Pschology or AMJ.
I would argue that the methods being proposed are being proposed for a reason - somebody actually had a problem that caused them to investigate this and their proposed method is supposed to solve the problem either better than a previous method or in a new/unique way. So in my mind it's not purely academic. It can take a while for these methods to get into widespread use but that's not necessarily a reason to not read up on them. As for data cleaning - of course that's a part of almost every analysis but it would be booooring and beside the point to include in any article so I'm not exactly sure the point you're making there. Data cleaning is something that is partially covered in the classroom and I don't see it as an article's responsibility to discuss how to clean the data or to go into too much detail on how they cleaned the data. It's important to note if they modified the dataset in a major way or left out observations but if the article is about a certain method or theory then spending time discussing data cleaning would be a waste of limited space.

I'm sure we'll get a few articles that are almost entirely applications but it will basically be up to them to say what kind of stuff they're interested in.
 

noetsi

No cake for spunky
#10
I would argue that the methods being proposed are being proposed for a reason - somebody actually had a problem that caused them to investigate this and their proposed method is supposed to solve the problem either better than a previous method or in a new/unique way.
We will have to respectfully disagree. I think they do it to push the frontiers of statistics, not for any practical reason. Most of the statistics they use will never be utilized by practisoners. You can go back and look at statistical journals in the seventies and see how many of their methods are used by practisioners (or even academics who are not statisicians). I am confident virtually none have been. Statisticians work at creating esoteric statistics with little practical value to practisoners, far beyond what most are interested in or need.

My point about data cleaning (etc) is that if anyone is going to utilize statistics outside academics "boring" things like that will be the most important thing they do. And its rarely if ever covered in journals. I am never bored by such, when I see it I get out my notebook and write it down immediately because its critical.

This may reflect the difference between doing statistics as a job and doing it for academics.
 
Last edited:

Dason

Ambassador to the humans
#11
We will have to respectfully disagree. I think they do it to push the frontiers of statistics, not for any practical reason.
I might call that a practical reason though. Just because you can't see an immediate use for something doesn't mean one doesn't exist. The field of Topology existed for a very long time as a purely abstract field with no practical uses at all. But now there are tons of practical uses and it's a very important field.

Most of the statistics they use will never be utilized by practisoners. You can go back and look at statistical journals in the seventies and see how many of their methods are used by practisioners (or even academics who are not statisicians). I am confident virtually none have been. Statisticians work at creating esoteric statistics with little practical value to practisoners, far beyond what most are interested in.
That's fine. Not all methods are going to be used by everybody. I still say a lot of them were at least prompted by an initial problem that they were trying to solve. Also - if the practitioners never read about the methods then how can we expect the methods to get into practice?

My point about data cleaning (etc) is that if anyone is going to utilize statistics outside academics "boring" things like that will be the most important thing they do. And its rarely if ever covered in journals. I am never bored by such, when I see it I get out my notebook and write it down immediately because its critical.

This may reflect the difference between doing statistics as a job and doing it for academics.
Data cleaning is very important. I agree. I have yet to meet a real world data set that didn't need at least a little scrubbing before doing an analysis. But the reason you do the data cleaning is to get to the analysis itself. It's not the focus of the analysis. It's important to spend time on it but unless you know some complex theories about data cleaning I'm not sure why we would spend time in a journal article explaining too much about the data cleaning process. Or are there advanced data cleaning techniques that I'm unaware of? I mean there are advanced data cleaning/normalization processes and I use quite a few of them (RMA normalization for gene expression data is one that I use A LOT - but there are papers on RMA itself so if we're using it in an analysis then we can just reference that paper in our paper) but if you aren't using anything revolutionary or had to do anything too interesting to clean the data why would we talk too much about it in an article?

Also it seems like you're under the impression that those of us in academics never actually do an analysis? Performing statistical analysis is mainly how I get supported through my RA. Last year I did consulting mainly for the Agriculture department so I worked with a lot of field studies (mainly corn... this is Iowa). So I do quite a bit of data cleaning and data quality checks because the data I work with is important (if it's not important to me it's at least important to the person that gave me the data). I need to do data management and be able to work with the data fluidly. And yet I wouldn't say this is the interesting part of my job. The interesting part is when I come to the analysis itself. Because sometimes the data I'm working with just doesn't fit well into any existing method. Or maybe we can jam our square shaped data into that circular shaped method but it's just not very satisfying. We could do better if we had an appropriate method. This is where I see methods research being done - it stems mainly from these kinds of situations. And trust me - your employer might not care about an elaborate statistical analysis but the people I work with - the people that give me real data that matters in the real world to real people - they do care and they want the best analysis for the job because it cost them a lot of money to get that data in the first place so they want the best analysis they can get.


So back to journal articles. Yeah I think we'll mainly be looking at articles that are relatively directly applicable or applications papers themselves. But who knows - those tend to be a little bit easier to read (but they still raise good questions) so the undergrads might want more difficult papers to read since they'll have somebody helping them read these things.

Edit: I should add that any methods/theory papers we read I would want to be directly usable to anybody that decided to just get the BS and go to work in the private sector. I think Benjamini-Hochberg's paper is nice for thinking about FDR and it will most likely raise good discussion about controlling for error in different ways - something any practitioner should be worrying about. If we were to dive into theory type papers maybe we'd look at something the discusses the theory behind some common test that they might use. Knowing the theory behind a test can help you use it in practice and help you know when a certain thing is or isn't appropriate - which is why I would be fine with looking at certain theory articles. But once again it's up to them to see what they would want to read (if there is any interest at all).
 
Last edited:

noetsi

No cake for spunky
#12
Data clean up to me actually means disagnostics (multicolinarity, multivariate normality etc), ways of addressing these (which can be very complex as for example dealing with missing data) and careful consideration of research design so that you actually measure acurately what you think you do. Not only because I think this is a critical issue to practisioners, but because its commonly a badly neglected area in statistics. Thus one sees very complex methods, but little thought is given to if the data is reliable or operationalization. Well you see a lot of cocenr with these issues in non-statistical academic journals but not nearly as much in statistical ones which focus on the methods.

I think another, badly neglected, point is translating (and reporting) what you find into something layman understand. I spend more time generating summaries that managment find useful than in running the data. And my data analysis is much less complex than what you do.

It all depends on where they are going to work. Most organizations (there are exceptions such as statistical units in Agriculture or Education) are not going to be reporting to individuals with any reasonable grasp of statistics or even data analysis.
 

Dason

Ambassador to the humans
#13
Data clean up to me actually means disagnostics (multicolinarity, multivariate normality etc), ways of addressing these (which can be very complex as for example dealing with missing data) and careful consideration of research design so that you actually measure acurately what you think you do.
Then we're talking about data cleaning in different ways. I don't see that as data cleaning at all (except maybe dealing with missing data). Diagnostics to me is more a model assessment tool and is a method in my book. Unless of course you're just willy nilly transforming everything to normality - in which case you could make a claim you're cleaning the data but I would say that you're doing it wrong.

Not only because I think this is a critical issue to practisioners, but because its commonly a badly neglected area in statistics. Thus one sees very complex methods, but little thought is given to if the data is reliable or operationalization. Well you see a lot of cocenr with these issues in non-statistical academic journals but not nearly as much in statistical ones which focus on the methods.
Once again I think checking diagnostics and model assumptions is very important. But it isn't necessarily the interesting part of dealing with the methods unless you find a new way to do the assessment so there shouldn't necessarily be much time spent talking about that. In an application article then that should be discussed because it's important to note that the method you're using works. If all you're doing is introducing the method then why waste valuable space in your article on stuff like checking normality? Note that I'm not saying you shouldn't spend the time making sure the data is appropriate for the analysis. That is very important and is something I make sure to always do - especially for a complex model - I'm just saying that in terms of journal articles you seem to be saying they should spend more time on this and I'm saying what's the point?

I think another, badly neglected, point is translating (and reporting) what you find into something layman understand. I spend more time generating summaries that managment find useful than in running the data. And my data analysis is much less complex than what you do.

It all depends on where they are going to work. Most organizations (there are exceptions such as statistical units in Agriculture or Education) are not going to be reporting to individuals with any reasonable grasp of statistics or even data analysis.
I did consulting. I worked with quite a few people that had little to no grasp of statistics is or what you do with it. They just knew that they were told to do an analysis. Trust me - I've had to deal with explaining results to the layman. It's an important skill but once again - why spend time in an article dealing with methods discussing that?
 

bryangoodrich

Probably A Mammal
#14
I wish I had something like this, but alas, my school barely even has a statistics faculty (and does not have its own department). I would love to know what you get organized, because I might want to jump on that reading myself!

On that note, what you cover can be quite diverse. Is there an overall agenda or curriculum to be followed? Otherwise, I would focus on things that would prepare them for the topics they'll face in grad school, if not some of the articles they might read in grad school. I also find solid examples of statistics in use to be beneficial. I just finished reading Navidi's Statistics for Engineers and Scientists, and he uses a lot of examples from journals. Of course, those examples have already stripped out the relevant statistical information and data to be presented. For the advanced undergraduate, it would be a great exercise to look at an example of even basic things (t-tests, confidence intervals, etc.) as published in the real world. The substance of the articles won't be important there, but building that statistical intuition, comprehending the provided information, and reviewing its statistical aptness are very good lessons to learn. Even better would be to have examples that show good statistical practices and some that are not so good. A couple of these a semester alongside more theoretical reading would make for a good curriculum of reading for the club. Like I said, I would also love to benefit from this assigned reading once you compile it!
 

noetsi

No cake for spunky
#15
But it isn't necessarily the interesting part of dealing with the methods unless you find a new way to do the assessment so there
Interesting to who? As a former academic I agree that it would not be interesting to professors and many graduate students. If your undergraduates want to work in a non-academic venue I suspect it would be quite interesting :) I have lost track of the number of times I have heard "my program did not teach me how to do anything useful." The reason is to many instructors the practical nuts and bolts you need to do your job is not important.
 

spunky

Can't make spagetti
#16
Edit: I should add that any methods/theory papers we read I would want to be directly usable to anybody that decided to just get the BS and go to work in the private sector. I think Benjamini-Hochberg's paper is nice for thinking about FDR and it will most likely raise good discussion about controlling for error in different ways - something any practitioner should be worrying about. If we were to dive into theory type papers maybe we'd look at something the discusses the theory behind some common test that they might use. Knowing the theory behind a test can help you use it in practice and help you know when a certain thing is or isn't appropriate - which is why I would be fine with looking at certain theory articles. But once again it's up to them to see what they would want to read (if there is any interest at all).
so... i kept on thinking about examples of articles that are both well-written, relevant, within the grasp of an advanced undergrad and i think i came up with a good example:

Hotelling, H. (1933) Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417-441.

it's a great article for quite a few reasons. first of all, because it was written by one of the greatest figures in the world of statistics, harold hotelling. second, i've found hotelling's articles to be clear, well-explained and he usually dwells both in the theory of how things work and why they work alongside with an example or two. now, i know anyone with some idea about the history of statistics will tell me that principal components was created by pearson and not hotelling, however, pearson discovered PCA by working out some obscure problem in multi-dimensional geometry whereas hotelling developed the derivation that, if i'm correct, we all learn in any standard multivariate analysis class (i.e. the lagarange solution to the optimisation problem of the variance)

i know whatever is chosen in the reading group is contingent on what the undergrads reply to the email you sent out but, if you like it, i think this particular article has a great combination of statistical theory, construction and exposition of an intelligible mathematical argument and little examples here and there to make sure we can all follow along...
 

noetsi

No cake for spunky
#17
I can't recommend any specific articles, but if you want journals that apply data analysis (including statistics) to the world of business you can look at JPART, AMJ/AMR (one of these two journals is an empirical journal I forget which), and Administrative Science Quarterly.

These are journals where the focus is on applying statistics to real world problems (well real world in the sense that academics understand that). :) They are the best empirical journals in business/public administration in the view of academics in that area.