# Which test should be used with 1 dependent variable and 7 independent variables?

#### Yama Karimi

##### New Member
Hi everyone
I'm carrying out a research which tries to analyse factors affecting coffee consumption behavior, there are 7 independent variables on hand and 1 dependent variable which is coffee consumption behavior
The data is collected using survey form, handed to 267 respondents (coffee consumers), the 7 given variables as can be seen in the image bellow are placed in the survey form in 7 sections, each section having their own 3-4 statement with 1-5 degree of agreement levels.
It should be mentioned that there is relation among independent variables as well

With this information and what is intended to explore (analyzing which of the 7 factors has a positive effect on consumption behavior);

Which of the two types of hypothesizes do you think should be used, first or second?
If second, which conceptual model should be used (Model 1 or Model 2) ?
And most important, which statistical analysis should be used in order to find what is intended?

Note: I suggested the first type hypothesizes and model-1 conceptual framework to my adviser, and he said use regression analysis, but he suggested that the first type of hypothesizes and model will fail, therefore I prepared the second type hypothesizes and draw the Model-2 conceptual framework and he accepted it. But I myself am not sure and a bit confused, to which type of hypothesizes and model, best conveys what is intended (analyzing which of the 7 factors has a positive effect on consumption behavior)

Sincerely
Yama Karimi

Bellow are the; Hypothesizes with given conceptual models, and a part of survey forms  #### GretaGarbo

##### Human
Can you write down the equations for these models?

And then, is there anything that prevents you from estimating them?

#### Yama Karimi

##### New Member
Can you write down the equations for these models?

And then, is there anything that prevents you from estimating them?
I don't have any equation for the models and don't know much about equations, I would appreciate it if you could shade some light with regards to equation
The issue is that i'm not sure which statistical analysis should I use to begin with

#### GretaGarbo

##### Human
Well if y is coffee consumption, then model 1 is:

y = a+ b1*H1 + b2*H2 +....b7*H7 + residual

Then model 2 is a little bit more complicated
There are several equations in the model:

H5 = a+ b1*H1 + b2*H2 + residual5

H7 = c0 + c1*H3 +c2*H4 + residual7

y = d0 + d1*H5 + d2*H6 + d3*H7 + residual6

For model 3 I leave it to you to write it down.

Then the job is to estimate the different parameters (the a, b1, b2,..c0, c1... d0...d3) That can easily be done with a statistical software.

#### obh

##### Active Member
Hi @GretaGarbo Under the assumption that no IV is calculated directly from other IVs. ( there are separate questions also to H5 and H7)
And the assumption that Model2 is correct, and the goal is Y.

Can the following be use?
y = a+ b1*H1 + b2*H2 +....b7*H7 + residual

#### GretaGarbo

##### Human
Under the assumption that no IV is calculated directly from other IVs. ( there are separate questions also to H5 and H7)
And the assumption that Model2 is correct, and the goal is Y.

Can the following be use?
y = a+ b1*H1 + b2*H2 +....b7*H7 + residual
I don't understand the question.

I just tried to write down the equations for each graphical model

Model 2 is a more restrictive model than model 1. Model 2 says how all variables affect the dependent variable y, directly or indirectly.

One can say that model 2 is a "structural equation model" (SEM) . (The model says that there is a specific structure about how the variables are influencing each other.) Then model 1 is a so called "reduced form" of model 2. (Just insert the components in the model. Note that the residual for H5 and H7 will be included.)

#### obh

##### Active Member
Let's assume that model2 is correct. Any reason not to run the following regression: y = a+ b1*H1 + b2*H2 +....b7*H7 + residual?
I think this was the original question ...
(besides the potential multicollinearity)

#### GretaGarbo

##### Human
Let's assume that model2 is correct. Any reason not to run the following regression: y = a+ b1*H1 + b2*H2 +....b7*H7 + residual?
I think this was the original question ...
Well if you "know" that model 2 is the correct model, then you would like to know the parameter values from that one. That is, you want to uncover the true relations among the variables.

Model 1 and model 2 are relatively simple models. y is influenced by H1 to H7, But say H1 is not influenced by y. There is no simultanety.

You can (in this relatively simple case) uncover the structural model from the reduced model. All of this is about the simultaneous equations models in the econometric area.

But for the original poster I believe that it is easiest to just estimate model 2 with a regression program.

#### obh

##### Active Member
Thanks Greta.

Using the following is good to understand the model.
H5 = a+ b1*H1 + b2*H2 + residual5
H7 = c0 + c1*H3 +c2*H4 + residual7

But If (only if) the goal is to estimate Y, won't the reduced model (y = a+ b1*H1 + b2*H2 +....b7*H7 + residual)
give better results than: y = d0 + d1*H5 + d2*H6 + d3*H7 + residual6 ?
At least has the potential for better results if some parts of H1, H2, H3, H4 doesn't include in H5, H7 and help to explain Y.

#### GretaGarbo

##### Human
At least has the potential for better results if some parts of H1, H2, H3, H4 doesn't include in H5, H7 and help to explain Y.
You forget about the residuals that infuences H5 and H7. They have an influence on H5 and H7 and there by on y. If you omit these residuals then you have a few omitted variables.

give better results than:
What do you mean by "better"? You will not recover the structural relations. Do you mean variance in prediction? (or maybe mean squarred error of prediction MSEP?)

The reduced form can be used to make predictions. But I don't remeber if that leads to less prediction errors.

#### obh

##### Active Member
Thanks, Greta for an interesting answer as always You forget about the residuals that infuences H5 and H7. They have an influence on H5 and H7 and there by on y. If you omit these residuals then you have a few omitted variables.
Do you mean you use the Observed H5/HF? or the predicted H5/H7? (as in put to calculate: y = d0 + d1*H5 + d2*H6 + d3*H7 + residual6 )
If you mean the observed, shouldn't it be like the reduced model?

What do you mean by "better"? You will not recover the structural relations. Do you mean variance in prediction? (or maybe mean squarred error of prediction MSEP?)
Correct, if the goal is the structural relations, you shouldn't use the reduced model I thought about prediction, didn't focus on a specific measurement ...

The reduced form can be used to make predictions. But I don't remeber if that leads to less prediction errors.
This may be interesting to know

#### obh

##### Active Member
Thank you @GretaGarbo it looks interesting #### noetsi

##### Fortran must die
LASSO is better than stepwise although I don't see why either is used here.