I am analyzing a sample of about 6000 actions carried out by about 500 multinational companies in about 80 countries during a 6 year period. Actions are carried out randomly, and are not longitudinal measurements for the same multinational over time. The actions can be of two types (A or B). My motive is find correlations between a set of predictors and the occurrence of A. As the outcome is binary, I am applying logistic regression.

Each action is conducted by a multinational in a country in a year. Predictor variables are measurements dependent on the multinational and year (e.g. number of employees), and country and year (e.g. GDP) respectively.

This means that because these measurements are based on categorical variables on other levels than the action itself (multinational level and country level), a lot of measurement values in the sample will be equal for multiple actions (observations).

E.g. all actions occurring in the same year and country (although by different multinationals) will have the same GDP value. Similarly actions in a year carried out by the same multinational (but in different countries) will have the same number of employees value.


Multinational Country Year SizeAsNumberOfEmployees GDPOfCountry Choice

MultinationalA CountryA 2008 45 55 A
MultinationalB CountryA 2008 23 55 A
MultinationalC CountryA 2008 99 55 B
MultinationalA CountryB 2008 45 77 B
MultinationalA CountryB 2010 48 83 A
MultinationalB CountryB 2010 28 83 B

So to summarize: predictor variables are measured on a different level than the observations (country, multinational vs. action(obs)) and thus many observations may be associated with equal values that correspond to the groups/levels for which they are measured.

Is this a concern, and if it is how should this be modelled? Is there a need for a multilevel model approach in this case or can I simply use simple logistic regression?