My hope is that people can chip away at the questions raised in the article. All responses are welcome!

I have been spending a bit of time trying to teach myself various univariate and multivariate statistical techniques. I would really appreciate a "one stop shop" with a general overview of various statistical techniques and their applications.

I am hoping to use statistics in continuous manufacturing processes, where on-line gauges constantly provide temperature, pressure, yield, pH, and other readings. In general, these readings can be input/output variables (to a reactor or other unit operation), and the readings could be measured every second, 10 seconds, minute, hour, day, month, and so on (I may mistakenly refer to this as time-series data). Input (independent) variables will typically be controlled with a set-point, but other uncontrolled variables are a possibility. Hence, in the case of every-one-second measurements at a particular set-point, there will be repeat measures.

I am currently reading a book on multivariate statistics for quality management, but I feel applications (so far) to repeat measures problems are lacking. Ultimately, it would be nice to have input regarding a starting point (method) so that I can go learn how/why a technique works, and then apply it without rummaging through endless possibilities before finding the right approach.

So, here is what I understand so far, and please correct any erroneous information. All questions I have begin with capital letters, but the other information could also be faulty:

__One-way ANOVA__

Assumptions:

-normal distribution

-standard deviations among levels are "equal"

Qualities:

-tests more than 2 population means to determine if they are significantly different

-one response variable (Y)

-tests using one "factor" or "treatment" (X) with varying levels

-can use repeat measures

-can use replicates

-followed by pairwise comparisons to see which populations differ

-Can replicates and repeat measures both be used in a single experiment?

__N-way ANOVA__

Assumptions:

-normal distribution

-standard deviations among levels are "equal"

Qualities:

-tests more than 2 population means to determine if they are significantly different

-one response variable (Y)

-tests using N "factors" or "treatments" (X(i=1:N)) with varying levels

-followed by pairwise comparisons to see which populations differ

-Do replicates assume you DO NOT have time-series data? In the process industry (assuming you are not collecting time series data), would collecting multiple data points at the same set point for each factor (X) be considered a replicate, even though you are collecting these data on the same process?

-Can this handle repeat measures (time-series data)?

__ANCOVA__

My Understanding:

-same as N-way ANOVA, but applied when the factors are assumed to be correlated with each other (for example, temperature may effect pressure or vice-versa)

-Can this handle repeat measures or replicates?

__MANOVA__

Assumptions:

-multivariate normal distribution

-standard deviations of each dependent variable are equal

Qualities:

-tests more than 2 population means to determine if they are significantly different

-p response variables (Y(i=1: p))

-tests using one "factor" or "treatment" (X) with varying levels

-automatically accounts for covariance or correlation

-Can this be followed by pairwise comparisons? Would it make sense to do so?

-Can this handle repeat measures or replicates?

__N-way MANOVA__

My Understanding:

-same as N-way ANOVA, but has multiple factors (X's) AND multiple dependent variables (Y's)

-Can this handle repeat measures or replicates?

Here is where I am a bit more shaky:

__PCA__

Assumptions:

-multivariate normal distribution

-Feel free to list the others

Qualities:

-variables from data set MUST BE "p" response (dependent) variables

-Assuming one can use independent variables in this analysis, would it make sense to do so if these variables are controlled with a set-point? How could one relate the input variables back to the response variables?

-determines a few principal components that account for the majority of variation in a data set

-these principal components will provide insight as to which response variables are accounting for most of the variation--this is done through variations "modes," so one can determine which of the p variables show variation in each mode

-with this information and knowledge of the process, one can fix the root cause of variation by knowing what directly affects the response variables

-Can this handle repeat measures or replicates?

__CFA (Common Factor Analysis)__

My Understanding:

-very similar to PCA in its theoretical background

-variables from data set MUST BE "p" response (dependent) variables

-used to determine which variables are most highly correlated rather than "modes" of variation for the data set

-Can this handle repeat measures or replicates?

________________________

I understand the dangers of simply "trusting" measurement systems and large logbooks of data. I understand the need to verify measurement systems and that not all answers can be discovered through any of these analyses. I understand that brainstorming root causes and solutions with a team is highly effective. However, it would be great to add more techniques to my statistical toolbox with confidence.

Thank you!

-Mike

I have been spending a bit of time trying to teach myself various univariate and multivariate statistical techniques. I would really appreciate a "one stop shop" with a general overview of various statistical techniques and their applications.

I am hoping to use statistics in continuous manufacturing processes, where on-line gauges constantly provide temperature, pressure, yield, pH, and other readings. In general, these readings can be input/output variables (to a reactor or other unit operation), and the readings could be measured every second, 10 seconds, minute, hour, day, month, and so on (I may mistakenly refer to this as time-series data). Input (independent) variables will typically be controlled with a set-point, but other uncontrolled variables are a possibility. Hence, in the case of every-one-second measurements at a particular set-point, there will be repeat measures.

I am currently reading a book on multivariate statistics for quality management, but I feel applications (so far) to repeat measures problems are lacking. Ultimately, it would be nice to have input regarding a starting point (method) so that I can go learn how/why a technique works, and then apply it without rummaging through endless possibilities before finding the right approach.

So, here is what I understand so far, and please correct any erroneous information. All questions I have begin with capital letters, but the other information could also be faulty:

__One-way ANOVA__

Assumptions:

-normal distribution

-standard deviations among levels are "equal"

Qualities:

-tests more than 2 population means to determine if they are significantly different

-one response variable (Y)

-tests using one "factor" or "treatment" (X) with varying levels

-can use repeat measures

-can use replicates

-followed by pairwise comparisons to see which populations differ

-Can replicates and repeat measures both be used in a single experiment?

__N-way ANOVA__

Assumptions:

-normal distribution

-standard deviations among levels are "equal"

Qualities:

-tests more than 2 population means to determine if they are significantly different

-one response variable (Y)

-tests using N "factors" or "treatments" (X(i=1:N)) with varying levels

-followed by pairwise comparisons to see which populations differ

-Do replicates assume you DO NOT have time-series data? In the process industry (assuming you are not collecting time series data), would collecting multiple data points at the same set point for each factor (X) be considered a replicate, even though you are collecting these data on the same process?

-Can this handle repeat measures (time-series data)?

__ANCOVA__

My Understanding:

-same as N-way ANOVA, but applied when the factors are assumed to be correlated with each other (for example, temperature may effect pressure or vice-versa)

-Can this handle repeat measures or replicates?

__MANOVA__

Assumptions:

-multivariate normal distribution

-standard deviations of each dependent variable are equal

Qualities:

-tests more than 2 population means to determine if they are significantly different

-p response variables (Y(i=1: p))

-tests using one "factor" or "treatment" (X) with varying levels

-automatically accounts for covariance or correlation

-Can this be followed by pairwise comparisons? Would it make sense to do so?

-Can this handle repeat measures or replicates?

__N-way MANOVA__

My Understanding:

-same as N-way ANOVA, but has multiple factors (X's) AND multiple dependent variables (Y's)

-Can this handle repeat measures or replicates?

Here is where I am a bit more shaky:

__PCA__

Assumptions:

-multivariate normal distribution

-Feel free to list the others

Qualities:

-variables from data set MUST BE "p" response (dependent) variables

-Assuming one can use independent variables in this analysis, would it make sense to do so if these variables are controlled with a set-point? How could one relate the input variables back to the response variables?

-determines a few principal components that account for the majority of variation in a data set

-these principal components will provide insight as to which response variables are accounting for most of the variation--this is done through variations "modes," so one can determine which of the p variables show variation in each mode

-with this information and knowledge of the process, one can fix the root cause of variation by knowing what directly affects the response variables

-Can this handle repeat measures or replicates?

__CFA (Common Factor Analysis)__

My Understanding:

-very similar to PCA in its theoretical background

-variables from data set MUST BE "p" response (dependent) variables

-used to determine which variables are most highly correlated rather than "modes" of variation for the data set

-Can this handle repeat measures or replicates?

________________________

I understand the dangers of simply "trusting" measurement systems and large logbooks of data. I understand the need to verify measurement systems and that not all answers can be discovered through any of these analyses. I understand that brainstorming root causes and solutions with a team is highly effective. However, it would be great to add more techniques to my statistical toolbox with confidence.

Thank you!

-Mike

Last edited: