“I have given a huge set of data that tabulated in rows and columns.”

How many observations (rows) and how many variables (columns) do you have?

As a first check of the data I normally run through for all variables the number of observations (n), the mean, the standard deviation and minimum and maximum.

Is the “n” reasonable? And the mean and standard deviation? Looking at min and max sometimes show you if there are any incorrect values. You might know that some variables must be within a certain range; larger than zero for example. Sometimes there are missing value coded as “999” or something similar. Sometimes (from excel I believe) missing values are coded as zero (0). There are all kinds of cleaning and checking that needs to be done.

This (data quality checking) is often forgotten. I believe that many (bad) decisions in companies are based on simply incorrect data.

(If you suspect that a few variables are incorrect you need to go back to the client and ask for corrections.)

Then when you have a rough idea of the level (the means) of your data, then it is good to look at the distributions of the variables. Boxplots and histograms are useful.

You will soon be overwhelmed by the amount of result (if there are more than five variables). You will need save your result and document which code create what output. (Save date of creation.)

Then you can look at relations - is one variable related to another? Scatter plots are good. You can also calculate correlation matrixes between variables. (I use most of the time Pearson product moment correlation – the “usual one”.) Don’t try to print to many at a time, but maybe you can have space on the page for 10 by 10 variables.

Then later on you can look at if the means are different for different groups (females or males for example).

Your study is exploratory. You want to explore what kind of information might be hidden in the data material. Almost all investigations are exploratory to some extent. Very few investigations just have a “hypothesis” to test. The type of: “What is your research question?”

I would suggest you to use exploratory factor analysis only as a last option. I suggest you to not use it at all. One reason is that, if you don’t know what it is then your listeners probably don’t know it either. Another reason is that it is statistically controversial. Some statisticians recommend not using it at all.

There are many information pages about how to learn R. On this site: on R/Splus look at: Info for R users.

Welcome back!

And welcome back here and tell us about your improvements!