Can AMOS work with data composed of probabilities rather than correlations?

#1
Hello everyone,

My question is a bit unusual, since it is not about the program itself, but regarding the data.
I have a matrix representing (reverse) distance between each pair of objects. In SEM, we usually work with correlations. A higher correlation represents variables that are closer to each other.
In my experiment, people were asked to classify 45 objects into groups of similar objects. For example, if I have a sample of 1000 people, and 500 put "Banana" and "Apple" in the same group, the (reverse) distance or "closeness" between them will be 500 (in frequency terms) or 0.5 (in probability/proportion).
Eventually, I am left with a matrix of 45X45, where the numbers inside it are the proportion that each pair of objects co-exist in the same group.
In general, I do not see a lot of difference between correlations and proportions (assuming the appropriate transformations), and that's why I thought it can be possible to use this matrix in AMOS. However, I have not found information about this on line or in books.
I would very much appreciate it if someone can refer me to any material that could be of some assistance.

Thank you very much!
Gabriel
 

spunky

Doesn't actually exist
#2
In SEM, we usually work with correlations.

uhm... nope, that's not quite right. we work with covariances. and although they can look and behave a lot like correlations, they most certainly are not correlations. we only analyse the correlation matrix when it's absolutely impossible to get the covariance matrix.

A higher correlation represents variables that are closer to each other.

that is, of course, assuming that a linear relationship describes this "closeness". variables can be very "close" in terms that they exhibit a well-defined relationship and still have a correlation of 0.

In general, I do not see a lot of difference between correlations and proportions (assuming the appropriate transformations), and that's why I thought it can be possible to use this matrix in AMOS.
this is a little bit of a tricky question, at least in the context of SEM. to begin with, a correlation (or covariance) matrix exhibits a series of properties that may or may not be satisfied by your matrix of proportions. the most important one being, of course, that covariance matrices are positive (semi) definite, so you may want to check for that. now, on to the second issue... the classic estimation method in SEM (maximum likelihood) uses the likelihood of the Wishart Distribution, which assumes that your variables where sampled from a multivariate normal distribution. this may or may not work in your case so you may want to try an alternative estimation method. and even if you changed the estimation method and you were able to show that this matrix of proportions is indeed a covariance matrix... it is a covariance matrix of what? like what does your SEM model look like?
 
#3
Dear Spunky,

Thank you for answering.
Thanks for the correction: for a non-statistician like me, it is easier to think of a correlation matrix and not the covariate matrix. Although I am not entirely sure why correlations shouldn't do the trick (I think of it as regression vs. standardized regression).
I have no idea how to check if the matrix positive (semi) definite... I believe normality can be assumed if N*p>5, but it will certainly be different from a covariance matrix.
So, if I understand correctly, you don't think this is allowed?
Is there another way to perform something like Confirmatory Factor Analysis with this matrix? I used a Hierarchical Clustering algorithm in my first sample to reach a structure, and I thought it could be validated this way...

Thanks you!
 

spunky

Doesn't actually exist
#4
Although I am not entirely sure why correlations shouldn't do the trick (I think of it as regression vs. standardized regression).
well, if you like the analogy with regression then i'm sure you know that in the case of OLS regression with multiple predictors, it is in general *NOT* true that the correlation coefficent equals the standardized regression coefficient. also, the significance tests of said coefficients relies on the standard errors of the UNstandardized coefficients. the same is true about SEM, standard errors for standardized loadings can be... well... funky. the likelihood equation used in SEM comes from the Wishart Distribution because we know the exact sampling distribution of the covariance matrix, assuming it comes from a multivariate-normally-distributed variables. we don't know the exact sampling distribution of the correlation matrix (we can approximate it, but i haven't seen it exactly derived), so which likleihood equation could be choose? dealing with correlations in SEM is possible, but you need to do some mathematical trickery around it so we tend to work with covariance matrices instead.


I have no idea how to check if the matrix positive (semi) definite
that's relatively straightforward. jsut perform an eigenvalue decomposition on it (like Principal Components Analysis) and check whether or not all the eigenvalues are positive. if they are, then you have a positive-definite matrix.


I believe normality can be assumed if N*p>5
univariate normality is necessary BUT NOT sufficient to perform SEM. you need multivariate normality so you'd need to check that with somethign like Mardia's test of multivariate skewness/kurtosis


So, if I understand correctly, you don't think this is allowed?
uhm... let's say i'm on a grey area here. for a while i've myself been wondering whether SEM can be used under a latent semantic analytic framework, since it is apparently OK for them to perform
Singular Value Decomposition (and other types of eigenvealue decompositions) on the incidence matrix of tex and content. the problem is that the indicence matrix is not a covariance matrix and even if it were... it is the covariance matrix of what?


Is there another way to perform something like Confirmatory Factor Analysis with this matrix?
multiple correspondance analysis or some other type of sorting algorithm sounds much more appropriate for this task. at least in my mind.