I want to perform a MLR (mixed model) on a data set (details below) that has both numeric and categorical explanatory variables. The data is as below:

Sample number: 85

Response variable: Y (numeric)

Explanatory variables:

Sr.No. Variable Levels

X1 Principal component 1 for a dataset Numeric

X2 Principal component 2 for a dataset Numeric

X3 Region 4

X4 Class 3

X5 Block 9

X6 Year 5

X7 Age numeric

X8 Age 3

X9 Duration 8

X10 Depth 5

X11 Option 2

X12 Seeds weight numeric

X13 Seeds proportion numeric

This will remain a mixed model, although some of the 13 variables may not be used in the final analysis.

Aim: Investigate the effect/influence of 13 variables on Y. However the focus needs to be on X1 and X2, which are the variables of interest.

My present understanding is I should use step-wise regression (direction=both) to identify which of the variables are significant. Then I should do a MLR to include the interaction effects as well.

However, I referred to online resources, books and articles to know what should be the best approach and how to interpret results. I got confused reading about step-wise regression (direction), treating categorical variables as random effects, lm or glm and am not sure what should be the best approach.

Can you please suggest which is the most appropriate for this data and aim?

If my current approach is right, how to decide the order of variables for categorical variables in step-wise regression? How to handle categorical variables for formulating the prediction equation? I get 'NA' for coefficients for some levels/variables (possibly because of higher predictor to samples ratio) - can these 'NA's be ignored or is there a solution to fix this?

I will be thankful to you if you can please post some links to online resources that are very clear and easy to understand.

Thanks again! ]]>

I am working with a panel data with more than 2000 observations and only two years. My topic is on "Full Risk Insurance" in rural households within a village. Thus, my regression is as follows:

log_Individual_Consumption = log_Individual_Income + log_Village_average_Consumption + errorterm

command: xtreg log_Individual_Consumption log_Individual_Income + log_Village_average_Consumption, fe

Now what I would like to add is an interaction term, more specifically a "Ethnolinguistic Fractionalisation Index EFL" which I have built and essentially shows the composition of the village in terms of tribes. Thus this index (EFL) only takes the value between 0 and 1.

Now my intentionis to analyse the influence of this index on "Full Risk Insurance" and to do so I need to interact village average consumption with the index such that I get how it affects the coefficient of Village_average_consumption when the index has a certain value.

I would like to ask what commands I could possibly use in order to carry out this regression. I have tried the one shown below and stata has returned that it does not have any more "room to add more variables". This is indeed possible because I have more than 650 villages, but is there any way that I could find a regression that is applicable accross villages?

Here is my current strategy: xi: reg log_Individual_Consumption log_Individual_Income log_Village_average_Consumption EFL i.log_Village_average_Consumption*EFL

I know that this is possibly wrong, but STATA knowledge is very limited in regards to interaction variables and panel data and would like to ask if anyone could help me with this.

Thank you ]]>

I have a random group of participants that answered questions on a survey regarding their general opinions on specific health activities. Each participant answered questions about 4 randomly selected health activities (Swimming, brushing teeth, etc.).

My dependent variable is days per month spent doing a particular health-related activity.

I want to estimate the impact of opinions (questions asking opinions are the same for each health activity) on days per month engaging in the activity.

I already know that certain groups of activites are performed more frequently than others, simply due to their nature. I have grouped the 40 different health activities into about 6 different groups. 3 of them are below to give you an example:

Fitness: Running, Biking, Swimming, etc.

Hygiene: Brush Teeth, Shower, etc.

Diet: Eat Meat, Eat Vegetables, etc.

I want to know how to properly estimate the impact of opinions on the frequency with which a health activity is performed. I think this would be a simple fixed effects model with dummy variables representing each health group (6 groups total) if each particopant had answered questions about each health activity. But because participants were given 4 randomly selected health activities, I believe that the make-up of participants answering questions for each health group could skew the average dependent variable because some people will just be more health engaged than others. I want to control for the make-up of participants answering questions for each activity, which I believe is called random effects?

Can anyone advise? I am currently using SPSS for modeling. Thank you. ]]>