+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 30

Thread: Analysis of elections data?

  1. #1
    Points: 2,041, Level: 27
    Level completed: 28%, Points required for next Level: 109

    Posts
    14
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Analysis of elections data?




    Dear All,

    I am currently writing an article on two parliamentary elections taking place in one country during 2009. Data consists of two exit-poll samples. It is an interesting case and could provide some interesting insights in relation to the political framework.

    What I would like do to is to test if ethnicity can or cannot be used to explain voters' choice of political party (this being taken for granted in many of the old Pol Sci works). The first thing I would like to do is to check if ethnicity can be used to predict which party a respondent likely would prefer. The second thing I would like to investigate further is if other variable may have a higher level of influence on voters' choice?

    The dependent variable would thus be "choice of political party" and independent variable "ethnicity", "gender", "profession", "locality (city/town/rural)", "age group", and "education group".

    I do have some experience of statistics since earlier, but it has been some years and although I am reading through my old stat books I am a bit afraid that I will start in the wrong direction or choose a type of analysis that either does not do me any good or simply is wrong...

    Any help that could start me in the right direction would be appreciated!

    Thank you.

  2. #2
    TS Contributor
    Points: 6,789, Level: 54
    Level completed: 20%, Points required for next Level: 161
    terzi's Avatar
    Location
    Mexico City, Mexico
    Posts
    420
    Thanks
    10
    Thanked 34 Times in 33 Posts
    Hi Saj,

    Based on your brief explanation, I would suggest some modeling techniques for your study. This could give you excellent results, specially if you are interested in predictions. My recommendation is a Logistic Regression Model. I don't know how many parties you have in your country, but you can use either a binary model or a a multinomial model. The sad part of this is that you may need a complex model to study this situation, since you talk about a survey that may (likely) be based on cluster sampling and/or strata. So, you must include the information of the sample design in the model, either using DEFF corrected standard erros or using a Hierarchical Model, which in my opinion is the best option but also the hardest. So, for modeling, you may need a hard statistical background. Still, the results would be awesome (I'd love to do something like that sometime)

    There are other options available, although you won't get results as interesting as those obtained from a model. One possible analysis is a Correspondence Analysis to study those relationships. This one is a bit easier and is also a valid scientific and statistical approach.

    Good luck with your study, I'm sure you'll do some amazing work with that data
    Statisticians are engaged in an exhausting but exhilarating struggle with the biggest challenge that philosophy makes to science: how do we translate information into knowledge

  3. #3
    Points: 2,041, Level: 27
    Level completed: 28%, Points required for next Level: 109

    Posts
    14
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re. Correspondence Analysis

    Thanks terzi, your ideas were execellent and much appreciated!

    The Logistic Regression model does sound as the best choice and I spent some time investigating it. However, I think it will have to wait for now. With my present knowledge of statistics it would take me too long to understand ;( (I will, however, try to dig a bit deeper in it for the analysis I need to do for my dissertation).

    Correspondence Analysis (I was actually looking at Factor Analysis bit it did not seem right) thus seems to be the best choice and I hope that you here might have the possibility to aid me a bit further. I will write my questions to the best of my ability and hope it will not take too much of your time.

    In SPSS I'm using the function for Multiple Correspondence Analysis. I will introduce the variables ***, educational group, age group, nationality, occupation, "for which party did you vote?", type of locality (urban/rural).

    1. The first thing I would like to do is to reduce the number of parties that actually ended up winning seats in parliament (=6) and move these into a new variable.

    2. May I place all variables within "analysis variables" when running the Multiple Correspondence analysis?

    3. Do I need to change any of the options in "discretize", "missing", "options", "output", "save"?

    4. Is there something in specific I should check in the output?

    5. As I understand it the "Correlations Transformed Index" lets you see how variable relate to one another going from 0 to 1. E.g. a 0.387 on "ethnicity" when related to "for which party did you vote", is not especially strong but, in this case, might be interesting when compared to the other variables.

    Thanks once more. With the help you've provided this far I am positive that the analysis of the exit-poll data will actually be quite interesting.

  4. #4
    TS Contributor
    Points: 40,089, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Downloads
    gianmarco's Avatar
    Location
    Italy
    Posts
    1,367
    Thanks
    232
    Thanked 301 Times in 225 Posts
    Hi!!
    I was thinking about your problem and the use of CA...

    Why are you considering to use Multiple CA? May be I am missing the point, but if you have built a contingency table up with, say, your parties in rows and "ethnicity", "gender", ..etc.., in columns, and you want to investigate the data structure and the relation between the choice of a party and other "variables", I think you should use Simple CA.

    If this is the case, I suggest to give a first look to the results by means of a program like PAST (freeware; just search on Google) that is far less answering for details of any kind.

    I hope this helps and, if you have more questions, do not hesitate to reply.

    Best Regards,
    Gm

    p.s.
    I have written down a brief primer to CA (with references): see http://xoomer.virgilio.it/gianmarco....le/Page714.htm

  5. #5
    Points: 2,041, Level: 27
    Level completed: 28%, Points required for next Level: 109

    Posts
    14
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks gianmarco,

    I'm too novice at this so my answer would simply be: to put them up as simple CA might well work As I understand it then, this implies running separate simple CA's for each independent variable and see how they relate to the dependent?

    I would love to have a look at that primer you are referring to. Unfortunately the link you provided seems to be broken. I had the same results from trying to google it.

    Thanks also about the PAST tip, I will check it out later this week. Could you briefly just tell me the advantage of using PAST before SPSS?

    Best regards,

    Andreas

  6. #6
    TS Contributor
    Points: 40,089, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Downloads
    gianmarco's Avatar
    Location
    Italy
    Posts
    1,367
    Thanks
    232
    Thanked 301 Times in 225 Posts
    Hi!!

    May be I did not understand your problem and/or the dataset you have: if you want, you can send it so I can give a look to it and give you better advices.

    As for Past: I never used SPSS for CA so far, but I used Minitab, Past, Statistica, Systat. I find that Past is very simple to use: just select the columns and let the analysis go with a couple of clicks.

    Please, could you tell me what kind of problem did you find with the link to my primer. I clicked on it and the browser opened the page containing the primer. So, I do not manage to figure out what the problem can be.

    regards,
    Gm

  7. #7
    Points: 2,041, Level: 27
    Level completed: 28%, Points required for next Level: 109

    Posts
    14
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Datasets

    Hi again,

    I will try to see if I can get to your homepage from home. Might be that the page, somehow, is not possible to reach from work.

    Sure, I attach the datasets from the Moldovan 2009 elections. Thanks for offering to take a look at them. Files are zipped (in order to be able to post here) and in .sav format. Text is in Romanian, but I judging from your name, my guess would be that you read it easy enough
    Attached Files

  8. #8
    TS Contributor
    Points: 6,789, Level: 54
    Level completed: 20%, Points required for next Level: 161
    terzi's Avatar
    Location
    Mexico City, Mexico
    Posts
    420
    Thanks
    10
    Thanked 34 Times in 33 Posts
    Good night in Mexico,

    I would agree with gianmarco. Using simple correspondence analysis would be easier and it would lead to pretty interesting results. Multiple Correspondence Analysis is a little bit trickier since it is designed to explore relationships between homogeneous variables that are somehow related. You would also need a deeper knowledge on the subject, since there are two main approaches for MCA and you need to select one, so the interpretation may change a little... I would suggest you to skip that

    Now, on another subject, the Simple Correspondence analysis is used to study relationships for a single contingency table, BUT there are some tricks that allow us to have more than two variables in a contingency table. For instance, using interactions will create contingency tables with more than two variables. You can interact gender (2 categories) and age (let's say 5 categories) to create a new variable that will have 10 categories and cross it with voting preference. Voila! You have a two-way contingency table with three variables. Of course, the more variables you introduce, the larger sample you require, but you can add as many as you want.

    This way, you can make an interesting analysis that won't be so complicated. STILL, Simple Correspondence Analysis is not so simple, interpreting the biplots may be complicated, specially the symmetric biplots where distances between row and columns do not measure association. So, it will also require some research.
    Statisticians are engaged in an exhausting but exhilarating struggle with the biggest challenge that philosophy makes to science: how do we translate information into knowledge

  9. #9
    TS Contributor
    Points: 40,089, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Downloads
    gianmarco's Avatar
    Location
    Italy
    Posts
    1,367
    Thanks
    232
    Thanked 301 Times in 225 Posts
    Hi SAJ,

    sorry for the delay in replying, but I have been out for work and I had no web connection.

    First, I agree with Terzi about MCA vs CA, and I also agree with him about the fact that the CA results are not so easy to be interpreted, at least at a first glance. I found very interesting and useful M. Greenacre, "Correspondence Analysis in Practice", 2008.

    Secondly, I will give a look to your data and I will let you know. Could you please tell me what format .sav is? What program does open this format?

    By the way, I am from Italy.

    Regards,
    Gm

  10. #10
    TS Contributor
    Points: 40,089, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Downloads
    gianmarco's Avatar
    Location
    Italy
    Posts
    1,367
    Thanks
    232
    Thanked 301 Times in 225 Posts
    Hi Saj,

    I gave a look to your data. I got a general idea of your variables, but I have some doubts on some of them (by the way, where are the counts? Are they under the label CODUNIC?).

    Could you give me more information about your dataset?

    If you want, you can write a private message here in this forum.


    Regards,
    Gm

  11. #11
    TS Contributor
    Points: 40,089, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Downloads
    gianmarco's Avatar
    Location
    Italy
    Posts
    1,367
    Thanks
    232
    Thanked 301 Times in 225 Posts
    Hi SAJ!!

    I propose a fictional example hoping to help you a bit with your issue.

    Please, give a look to the attached JPG pict.

    Let use suppose that you can build a contingency table which allows to cross-tabulate the parties you are interested in with some variables like gender, nationality, etc....

    I limit the example to 5 parties.

    Each table's cell contains a number that stand for the number of vote that each party received in relation to the type of gender, the type of nationality, etc.

    By means of Correspondence Analysis (like stressed in previous posts) you can explore the relationship between parties and the various variables. You can inspect the table in search of groupings and (if they exist) you can have a general idea of what party is similar to others and which variable affects the similarity.
    See scatterplot in the attached pict.

    Additionally, you can sort the original contingency table on the basis of the score of data-points on the relevant axes.
    See the second table in the attached file.

    Additionally and for ease of analysis and/or visualisation, you could distinguish (to keep with my example) two broad groups (labelled A and B). See the third table in the attached file.

    Once you have devised groupings of parties and variables, you can further perform the hypothesis tests you prefer to check the statistical significance of the difference detected.
    See, for example, the chi-square test performed to check the difference in gender between the two groups of parties (see the bottom picture in the attached file).



    I hope this can help.
    Regards,
    Gm
    Attached Images  

  12. #12
    TS Contributor
    Points: 40,089, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Downloads
    gianmarco's Avatar
    Location
    Italy
    Posts
    1,367
    Thanks
    232
    Thanked 301 Times in 225 Posts
    (the JPG file in zip format)
    Attached Files

  13. #13
    Points: 2,041, Level: 27
    Level completed: 28%, Points required for next Level: 109

    Posts
    14
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Concerning the jpg

    Thanks gianmarco, I appreciate the time you are taking to explain this to me. I have had a look at the jpg file you provided and also tried to understand a bit more how CA works.

    The contingency table is clear to me. I also understand the basic chi2-logic that goes along with CA. Through CA we may see that variables are related but nothing on how strongly they are associated.

    If I understand your second table correctly we may thus say that

    gender2, age_class1 and age_class2 are related to party 5 and party 2.

    age_class3, nationality3, nationality2, gender1, age_class4 and nationality 1 are related to parties 3,1,4.

    Questions:

    how should axis 1 and 2 be interpreted? Axis 1 is the parties? What is then axis 2?

    The two broad groups A and B, what do they really mean?

    SAJ

  14. #14
    TS Contributor
    Points: 40,089, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Downloads
    gianmarco's Avatar
    Location
    Italy
    Posts
    1,367
    Thanks
    232
    Thanked 301 Times in 225 Posts
    Hi SAJ.

    I am sorry not to have gone in deep into CA details. I attach a Primer to CA that I have written some time ago.
    Since I am archaeologist, the Primer take into account some examples of archaeological interest, but the mechanic of CA remains the same and I believe that the Primer can be understood also by a non-archaeologist.

    The Primer explains how to interpret the scatterplots and provides a minimum of bibliographical reference.
    If you are interested in CA, I suggest to read the book of M. Greenacre (quoted in the Primer). Many website do exist explaining the basics of CA as well.

    I gave a look again to the files you attached in your previous post, but I do not find the parties. So, if you could provide a contingency table (you can use Excel) (or if you could extract from your data a contingency table with partie in rows, variable in columns), I will be happy to help you with CA and its results.

    As for CA and the strength of association(s), CA allows to explore the relation in your dataset, to devise groupings, to understand the relation between objects and variables; in essence, it allows to reduce the dimensions of your data and to facilitate interpretations. The strength of association(s) must be checked in a later stage of analysis, by means of the hypothesis tests that better fit the data and the hypothesis stemming from the exploration of the data-set.

    When I was talking of the two broad groups, I was only making an example. It could be of interest (or it could be not, from your standpoint) to devise groups, differing from certain variables. It is only an example. May be that you could found interesting that group A (comprising parties x and y) is more related to man aged 24-50, whereas group B (comprising parties z and w) is more related to woman aged 50-60. Or that group A is more related to an ethnic groups than the other one. Etc etc.

    As for your interpretation of the second table, you are right. The same interpretation should stem from the scatterplot. If you see it, the 1 axis (in my example) is mainly opposing Party 2 and 5 to 1 and 4.
    However, the scatterplot interpretation, it is a delicate step. You find more info in the Primer.

    I hope this can help, and I really hope that my hints are not leading you astray.
    Any comment from other member is welcome.

    Feel free to reply and to ask more.

    regards,
    Gm
    Attached Images

  15. #15
    Points: 2,041, Level: 27
    Level completed: 28%, Points required for next Level: 109

    Posts
    14
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Table


    Thanks Gianmarco and Merry Christmas!

    I will construct a contingency table and post it here - or perhaps even one for each elections. Just one question before I do that. In order to break up the different categories I will need to recode the variables. Correct me if I'm wrong but that would basically just be to put in a "1" for the characteristic I want to count, for nationality i.e. "Moldovan", "Romanian", "Russian" etc. respectively.

    However, within some variables there will be missing cases (within age and ethnicity) and I, moreover, would like to include only parties that entered parliament (the other ones are very marginal). For the first categories cases are relatively few (age=1 missing; nationality=34 refusals) but for parties it will mean the diminishing of some 900 cases out of more than 19,000.

    Am I going around it the right way when preparing for the contingency table? I cannot really see any other way to do it. Leaving the cases as they are it would mean that we would have all the variable for the other categories accounted for but there would e.g. be more cases for "men" and "women" then there would be for the different parties added together.

    All the best,

    SAJ

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Similar Threads

  1. Replies: 1
    Last Post: 02-24-2011, 11:20 AM
  2. Replies: 0
    Last Post: 04-04-2010, 08:14 PM
  3. Replies: 1
    Last Post: 05-19-2009, 04:06 AM
  4. Need help with data analysis
    By devi in forum Statistics
    Replies: 3
    Last Post: 03-11-2009, 07:36 AM
  5. Please help DATA ANALYSIS
    By bear in forum Statistics
    Replies: 4
    Last Post: 01-18-2006, 01:57 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats