Hi,
I'd highly appreciate your recommendations for a research I'm doing. More specifically, I'd like to know the advantages, disadvantages, and justifications of choosing a method over another so that I could decide myself.
A brief idea about the research: I have 25 items that I think contribute to technological proficiency. Response options vary from Likert scales to continuous values (some of the responses are limited within a specific range, though).
I'm planning to use correlation, multiple regression, factor analysis. Following on from these, I'm planning to build an instrument. There are 3 important measures, which I'm going to use:
1. An overall perceived technological proficiency; this is self-rating, ranging from 0 to 100.
2. An overall perceived technological proficiency; this is a composite score based on the 25 items of interest, also ranging from 0 to 100. (I have not created this measure yet)
3. A "true" technological proficiency score; this is a total score based on actual computer tasks that participants have done. For each task the outcome is a binary (1 success, 0 fail). Since I have 8 tasks, this score ranges from 0 to 8.
Problem: Since I'm going to run many statistical tests, I need first to select a good measure of proficiency, which I will be using in subsequent analyses (e.g. as a dependent variable in multiple regression). Looking at initial results, I'm starting to think that measure No. 1 (above) may not be highly correlated with most of the items (i.e. participants either overate or underrate their proficiency. Therefore, I'd like to create measure 2 above (a composite score based on 25 items) and then see which of these 2 measures is better. In order to compare them against something, I thought measure 3 above would be the most suitable one, as I think actual computer tasks are more accurate measure of proficiency than own perception. Thus, of measure 1 and 2, the one that has a higher correlation with 3 will be used.
Options available: Based on my research on this website and others, I've found that there are 3 ways of creating a composite score (if you have other ways as well, I'd be happy to know):
1. Using Factor Analysis (scores are automatically weighted)
2. Using Z-score (scores are weighted manually)
3. Using other manual computation scores (e.g. converting each score to a percentage by dividing each value by the maximum)
One of the assumptions I've made is that each item, which will be used in the computation of the composite score, has equal importance. Therefore, I don't think option 1 above is suitable in my case (if you disagree, I'd like to hear from you on this). Thus, I'm left with option 2 and 3.
Questions:
1. Since eventually I need a measure of proficiency that only has values from 0 to 100, I'd like to know which of the options 2 and 3 above is more suitable in my case?. In other words, what advantages does option 2 give that 3 does not, and vice versa? (I'd appreciate it if the answer includes conceptual justifications)
2. I've noticed that for some of the 25 items, the minimum value of responses is not zero. I've given it some thought as to whether or not I should shift the values of such items so they can start from zero. As I understand it, someone suggested that this would not matter at all. However, I'm thinking that if I shifted them, those responses with minimum values would automatically be excluded from the average of the composite score, but leaving values as are would not result in more of them to be excluded. Am I right?
3. If I decided to use Z-score option:
3a. In order to avoid issues with negative and positive values cancelling each other, could I just shift Each Z-score and rescale it to start from 0 to a 100 and then take the average of these scores? (if this does not matter I'd like to know why not)
3b. I've read that I could also create a Z-score from Z-scores and use it? why would this ever be done? is it a good idea to use this in my research?
3. I've been told that when calculating Z-scores, I should use the mean and standard deviation of the population not sample ones, unless they are not available to me, in which case I could use the sample mean and standard deviation but I should mention this as a limitation. Is this true? and why?
Thank you very much indeed.
Tweet |