I spent all day today learning more about what I have no clue what-so-ever about. The SAS looks impressive but the final tool I chose was R language for the reason that I am a programmer and it's cheap. I'd appreciate if someone says something.
Hi,
My name is ngungo, means innocent or ignorant, and I am new here. I am a retired Computer Process Control Engineer. I had few statistics courses in school and had applied some at work long time ago and now forgot all about it. I guess I can hit the books again to refresh but I need some hints. Be warned that I don't even know how or what to ask.
Here is my problem. I have given a huge set of data that tabulated in rows and columns. I need to understand it, read it, comprehend it. I need it tells me something. What can I do? What procedures or formulas? I heard there is some statistics tools like black box that you just feed the data then it tell you something. Is such a thing exists? By the way I am pretty good at numerical programming.
I don't know what else to ask or maybe you ask and guide me to right direction, please. Thanks in advance.
--ngungo
I spent all day today learning more about what I have no clue what-so-ever about. The SAS looks impressive but the final tool I chose was R language for the reason that I am a programmer and it's cheap. I'd appreciate if someone says something.
I downloaded the R-2.15.0 and R-intro book. The R language is very interesting. I just finish the Appendix A: A sample session. It's a productive Sunday.
I found something relates to my subject today. Descriptive and Inferential Statistics.
R is awesome and I'm glad to hear you got it all installed and running. I'm not really sure what you meant by your last post though. If you have specific R questions feel free to post a new thread asking about them.
I don't have emotions and sometimes that makes me very sad.
Before you use a method, or a statistical software, to answer this question - you have to decide why you are interested in the data. What is your research question? It is very rare, and probably impossible, to just look at data and decide something with it unless you have some purpose in mind before you start.
As someone pointed out to me, if you have actual variables in your rows or columns, one easy place to start is with EFA. Exploratory factor analysis.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
@Dason: I sure will. Thanks.
@noetsi:
Few weeks ago I met an old colleague and we had some beer. I complained to her that my life was dull and I missed the good ol' days at the company. A week ago I heard from her again offering a small commission. She threw at me a good size table of numbers and asked me to make sense out of it. She did not know I don't know jack about statistics though she knew I was in the process control field and I did not volunteer to tell.
At the moment, I try to get acquainted with R and to relearn Introduction to Statistics. I am also to find out what Exploratory factor analysis is about now. Thanks for the hints. I got my life back.
How many observations (rows) and how many variables (columns) do you have?“I have given a huge set of data that tabulated in rows and columns.”
As a first check of the data I normally run through for all variables the number of observations (n), the mean, the standard deviation and minimum and maximum.
Is the “n” reasonable? And the mean and standard deviation? Looking at min and max sometimes show you if there are any incorrect values. You might know that some variables must be within a certain range; larger than zero for example. Sometimes there are missing value coded as “999” or something similar. Sometimes (from excel I believe) missing values are coded as zero (0). There are all kinds of cleaning and checking that needs to be done.
This (data quality checking) is often forgotten. I believe that many (bad) decisions in companies are based on simply incorrect data.
(If you suspect that a few variables are incorrect you need to go back to the client and ask for corrections.)
Then when you have a rough idea of the level (the means) of your data, then it is good to look at the distributions of the variables. Boxplots and histograms are useful.
You will soon be overwhelmed by the amount of result (if there are more than five variables). You will need save your result and document which code create what output. (Save date of creation.)
Then you can look at relations - is one variable related to another? Scatter plots are good. You can also calculate correlation matrixes between variables. (I use most of the time Pearson product moment correlation – the “usual one”.) Don’t try to print to many at a time, but maybe you can have space on the page for 10 by 10 variables.
Then later on you can look at if the means are different for different groups (females or males for example).
Your study is exploratory. You want to explore what kind of information might be hidden in the data material. Almost all investigations are exploratory to some extent. Very few investigations just have a “hypothesis” to test. The type of: “What is your research question?”
I would suggest you to use exploratory factor analysis only as a last option. I suggest you to not use it at all. One reason is that, if you don’t know what it is then your listeners probably don’t know it either. Another reason is that it is statistically controversial. Some statisticians recommend not using it at all.
There are many information pages about how to learn R. On this site: on R/Splus look at: Info for R users.
Welcome back! And welcome back here and tell us about your improvements!“I got my life back.”
@GretaGarbo: What a treasure! Thanks so much.
Your response is exactly what I am looking for. Procedure, Procedure, Procedure . As I said, my discipline is engineering and computer science, that's why procedure. So now my understanding is much clearer, last night, I have made a list of tasks and then will proceed accordingly. To answer your question, the table consists of no less than couple dozens of variables and thousands of observations. It will be months of work. It's fantastic.
Reading your advice, unless you tell me otherwise, it seems I just need to be R sufficient in file read and write, functions min, max, mean, standard deviation, and boxplot histogram; and later on Pearson product moment correlation. Except for the Pearson thing that I hope you will give me some more hint later on, I think I just need to install R, an R editor, some R graphic package. I cut out 10 days to do these and also Statistics book reading. That will be due at the weekend of Father's Day. What a fantastic gift.
Thanks so much!
1. The Schaum's book came
2. Re-installed R per G. Jay Kerns
1. The Schaum's book came
2. Re-installed R per G. Jay Kerns
3. Installed NppToR:
+ I've been a fan of and using Notepad++ since forever
+ Kicked the tire and like it. Nice integration.
Last edited by ngungo; 06-06-2012 at 08:27 PM.
NppToR is pretty nice. I prefer RStudio.
I don't have emotions and sometimes that makes me very sad.
I looked into RStudio but it is for Unix, isn't it?
I don't have emotions and sometimes that makes me very sad.
Tweet |