2X2 design ANOVA SAS

I am beginner with statistics and I need some help for my data analysis. In my research I am trying to find an impact of colors on websites and types of webstores on customer trust, purchasing intention, preference etc. For example websites selling insurance service will be trusted more if the color of websites is blue than red.

Design of my research is 2x2 /color: red vs. blue × type of website: service vs. outlet. I ran my experiment as a survey with imagined pictures of websites, but I made them according to real websites. I only made same changes as more visible background color... One group of respondents has showned only with blue version of websites /picture of website/ where was at first blue version of service store and later outlet webstore. The second group had survey only with red pictures of websites - again service webstore and outlet webstore.

I am not sure if I should use one-way anova or two-way anova and how the model should look like. Or is there a better solution?

Thank you
Thank you for your idea. Ok I can try a chi-square test but problem is I´ve never done it before. I finished only one course of statistics and we learnt only how to use SAS for ANOVA and regression. Do you think you can write me a model for a chi-square in SAS? The rest I will try to find it on net... I only need to be sure the model will be right...
For example

proc freq data = "c:\search path\";
tables color*typeofwebsite / chisq;

I don't know what your variables are named. So just change color to whatever name you've given to this variable and same with type of webiste. Good luck.:tup:
Thank you, but still according to this model I am not able to find the customer preference or trust or purchasing intention on website and color of website...or?
Maybe, I misunderstood you.
The model/code I wrote earlier was only for investigating the relationship between color and type of website.

Tell me which of your variables is the depedent variable and which are independent. How is your dependent variable meausured. I need to know the scale of the variable.
Yes its seem to be a little misunderstanding :) So my independent variables are color /red, blue/ and type of website /service, outlet/ and my dependent variables are preference, trust, purchasing intention and price fairness. I measured it through the likert scale /1-7 / as for example: very much prefer /7/ vs very much dont prefer/1/.

You can also check my survey to see how I conducted my research here: https://docs.google.com/forms/d/1xSuvmEom3B2sf0kNRJpADrBrVrNDNz0Ei3JAb6Pt5Q8/viewform
And I did same survey but for red color - so with red pictures...

Later I will delete it from here...
Last edited:
The "right" model is the one given by a theory/hypothesis and that you might want to confirm/reject. Of course you can always generate a lot of models and choose the "best" one but that is an explorative approach rather than a confirmative data analysis.
Yes, thats make sence... I have an idea according to my hypothesis what I am trying to prove and what its impossible to prove with this stage of research... anyway thank you so much for help ;)
Here our dependent and independent variables are discrete there fore we can use CHI SQUARE to check the good ness of fit

By doing this test we will understand the difference between actual observed ratings and the future expectation (of ratings/liking) on the same aspects & deviation and through p value we will be able to determine weather or not we should trust; that the appeared deviation between expected and actual is just because of change and not because of some additional influence.

Chi-square test is only meant to test the probability of independence of a distribution of data. It will NOT tell you any details about the relationship between them.
It is also important that you have enough data to perform a viable Chi-square test. If the estimated data in any given cell is below 5, then there is not enough data to perform a Chi-square test. In a case like this, you should research some other techniques for smaller data sets: for example, there is a correction for the Chi-square test to use with small data sets, called the Yates correction. There are also tests written specifically for smaller data sets, like the Fisher Exact Test.

Just for clarity
You have 2 factors with 2 levels each and one response that will contain numbers in from 1 to 7

Factor one = color the two levels between that are red, blue
Factor two = type of website the two levels between that are service, outlet

above given are your independent variable x

NOW 4 dependent variables are
(1) preference,
(2) trust,
(3) purchasing_intention and
(4) price_fairness

all four will have responses that is ratings form 1 -7

The calculative results can come even through the analysis of rating between 1 to 7
Say for example in order to take a decision we will EITHER put one of the color for some kind of preference, [say] OR not put that color
Thus we have only accept or reject

For clarity I have taken a fictitious data [please see the attachments] then analyzed them on your stated requirements

Say 204 people responded with rating on RED & 183 people responded with ratings on BLUE

In part A I will deal with yes or no kind of feed back then in part B I will also show the results as rating scale 1 to 7 . You may use your data either way.

I wanted to convert the 1-7 scale ratings in two point rating i.e. accept reject there fore
For count of rater accepted I have added all the rating those were between 1 to 3 (3 columns)
For count of rater rejected I have added all the rating those were between 5 to 7 (3 columns)

For those who rated 4 in the scale of 1 to 7 have no specific meaning of acceptance or rejection to the type of webpage so I should omit that column BUT the total of the count of ratings should be uniform through out color wise.
Remember you have asked for ratings on the basis of color and form different set of populations

Also as per the rule of chi square to relay on the results of calculations you cannot omit the ratings so column 4 cannot be omitted.
In order to keep the end count of total rating uniform I have divided the count of ratings under column 4 in to two and added half of it to accepted col and half to rejected col.
Now the end count of voter will remain same

Please see attachment of data for clarity, in excel color scheme combination and corresponding count/number to scheme is given.
For calculation I have used minitab. The output is given in text file . These outputs have corresponding count/number wise calculated results for ACTUAL / EXPECTED AND DEVIATION.
For ex corresponding count/number 3 stands for red::service::purchasing_intention

The given data in excel is a random generation of data in excel there fore they are similar with less deviations
In real time data you may get huge deviation also.
Please see the attachments

Lets take example of one of the outputs in file A - CHI SQUARE for all ratings clasified in accepted rejected

We take the scheme no. 1.

The observed/actual rating (say) for accepted = 107 when what you can expect is 102.33
The observed actual rating (say) for rejected = 97 when what you can expect is 101.67
Also the chi sqr output for deviation i.e. (deviation sqr)/ expected rating [(107 - 102.33)^2/102.33] = 0.213
This is < 0.6 thus not reliable for making any decision weather this color scheme 1 will bring any attention etc.

We can also use this number 0.213 to find P value in chi square table.
Incase p value come to be less than 0.05 then we say that there are less than 5% chances that there is no external influence on the deviation between observed vs expected (ideal) i.e. we can estimate the viewer’s preference for future through expected observation value. i.e. 102.33 (count of votes of acceptance)
Because the chance is less than 5% we cannot rely ; there are 95% chances that there exist external influence and that in future we will not get acceptance of 102.33 people out of 204 voters

State your conclusion in terms of your hypothesis.

In the file Part A - CHI SQUARE for all ratings clasified in accepted rejected see the final P value this also states that you cannot rely much on complete set of feed back. But remember this observation is due to non randomness of system generated data and not actual feed back.

a. If the p value for the calculated is p >0.05, accept your hypothesis. 'The deviation is small enough that chance alone accounts for it. In our example P-Value = 0.172, means that there is a 17.2% probability that any deviation from expected is due to chance only and no external factor is acting for this deviation. This is within the range of acceptable deviation. Thus you can relay on the choice of color and scheme

This was just and example data in real time you may have chance to get P value <0.05

b. Suppose If the p value for the calculated isp <0.05, reject your hypothesis, and conclude that some factor other than chance is operating for the deviation to be so great. For example, p value of 0.01 means that there is only a 1% chance that this deviation is due to chance alone. Therefore, other factors must be involved influencing the deviation thus you cannot relay.

PART B [just for understanding]

Calculated for complete ratings 1 to 7

Please see the file for calculation out put Part B - CHI SQUARE for all 7 ratings

You can use this site to calculate your values
Use the site to calculate statistical results http://graphpad.com/quickcalcs/chisquared1.cfm