I have a dataset consisting of 1812 observations on packages. Variables are error(y=1/n=0) and country of origin (five counties). Error is the dependent variable and countries is the independent variable. I've creating dummy-variables for each of the five countries.

My research question is: whats the likelihood of an error from each country compaired to the entire population.

Code:

```
> logreg_3
# A tibble: 1,812 x 7
error country cn nl ee hk my
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 cn 1 0 0 0 0
2 1 nl 0 1 0 0 0
3 1 nl 0 1 0 0 0
4 1 my 0 0 0 0 1
5 1 my 0 0 0 0 1
6 1 nl 0 1 0 0 0
7 1 hk 0 0 0 1 0
8 1 hk 0 0 0 1 0
9 1 hk 0 0 0 1 0
10 1 hk 0 0 0 1 0
# ... with 1,802 more rows
```

Code:

```
Call:
glm(formula =error ~ cn + nl + ee + hk + my,
family = binomial, data = logreg_3)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0933 -1.0893 -0.6987 1.2640 1.7492
Coefficients: (1 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.25276 0.40089 -3.125 0.00178 **
cn -0.03292 0.40989 -0.080 0.93598
nl 1.04184 0.40993 2.542 0.01104 *
ee -14.31331 280.09167 -0.051 0.95924
hk 1.05157 0.41364 2.542 0.01102 *
my NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2308.1 on 1811 degrees of freedom
Residual deviance: 2177.1 on 1807 degrees of freedom
AIC: 2187.1
Number of Fisher Scoring iterations: 14
```

Code:

```
round(exp(coef(logit)),3)
(Intercept) cn nl ee hk my
0.286 0.968 2.834 0.000 2.862 NA
```

I have some difficulties in interpreting the results and I have some specific issue I'd like to address.

My questions are:

1) how do I overcome the dummy variable trap in R, thus avoiding the NA for the last predictor? Using +0 to remove the intercept does not seem to works as the results are changed in a matter that makes no sense. I wish to calculate OR for all countries to determine/forecast the risk of error for each country.

2) Is this even the right model for answering my research question?

3) Say if, it is the correct model: Is it correct to interpret the positive estimates as a token for increased risk of error and the negative estimates as decreased risk of error? I do understand that the relationship is non-linear, hence the size of the estimate makes little sense on its own.

4) How should I interpret the odds-ratio in this case with multiple predictors and a single outcome?

5) Any ideas for further modelling/analysis?

Thanks in advance