I’m preparing some data with a view to carry out a multivariable logistic regression.

The dependent variable is LANDSLIDES, i.e. occurrence, and is dichotomous (YES/NO).

The independent variables are a collection of environmental factors, such as VEGETATION, SLOPE, ELEVATION, and ROCK TYPE, to name a few. Some of these are continuous (e.g. SLOPE) whilst some are categorical (e.g. VEGETATION).

Concentrating on the categorical variables and VEGETATION in particular, I have already carried out a Chi-square test that has indicated that it influences the occurrence of LANDSLIDES. From reading around, I understand that in multivariable logistic regression it is advisable that the dependent variables should be converted to dichotomous variables and my question is...

**how should I do this?**

After looking at the data and running a few exploratory LR analyses I thought of a possible way of proceeding and wanted to ask people opinions about it. My "methodology" is as follows.

Firstly, by examining the contingency table for LANDSLIDES vs. VEGETATION (please see attachment), I've noticed that for several classes of vegetation there are no positive observations, i.e. in these classes landslides do not occur. My first step would be to combine these categories and give them a new value (

*OTHERS*).

Secondly, using the reclassed VEGETATION variable, run a bivariante logistic regression analysis of LANDSLIDES vs. VEGETATION, using the

*OTHERS*class as the reference class. In the resulting equation variables table, the coefficients (B) are either negative or positive. The negative values correspond to a reduced risk of LANDSLIDES whilst the positive values imply an increased risk. My idea was (1) to group the categories with negative coefficient values with the OTHERS category, and (2) group the categories with positive values, thereby creating a dichotomous variable. This variable would effectively have one class of vegetation types that "don't" cause landslides and one which "does".

I'm unsure whether this is sound and reliable, statistically speaking, and that's why I'm writing. If this is a good way of proceeding, I could then use the same method to reclass my other categorical variables before proceeding.

Many thanks in advance.

Matt

PS. I'm using SPSS.