+ Reply to Thread
Results 1 to 8 of 8

Thread: How does one develop statistical models to compensate for weakness in public data?

  1. #1
    Points: 14, Level: 1
    Level completed: 27%, Points required for next Level: 36

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    How does one develop statistical models to compensate for weakness in public data?




    Hi everyone. I have an interview coming up and the role is data centric. I have a strong knowledge of data and management, however I have never considered using statistical models to account for weaknesses in public available data. How should I go about this? What models can be used? May this refer to SPSS automatic data preparation? Any help will be very much appreciated guys. Thanks very much!

  2. #2
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: How does one develop statistical models to compensate for weakness in public data

    Issues with data:

    Selection bias (e.g. systematic issue with collection of sample or retention of sample)
    Measurement error (e.g. variables proxies, mismeasured, etc)
    Confounding (missing pivotal variable variable, so you have suprious relationships)
    Stop cowardice, ban guns!

  3. #3
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: How does one develop statistical models to compensate for weakness in public data

    hi,
    governnent data? Look up the Benford distribution as well.

    regards

  4. #4
    Points: 14, Level: 1
    Level completed: 27%, Points required for next Level: 36

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: How does one develop statistical models to compensate for weakness in public data

    Thanks hlsmith and rogojel. Just touching on you point Rogojel regarding Benfords law. The leading digits theory is quite interesting. How would one interpret this however? The data concerns Deaths due to Terrorist Acts, so essentially it is government data that is naturally occuring. What can one do to interpret this? Thanks!

  5. #5
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: How does one develop statistical models to compensate for weakness in public data

    Another big issue with public datasets are that they are secondary data, so you are limited to what they collected!
    Stop cowardice, ban guns!

  6. #6
    Points: 20,006, Level: 89
    Level completed: 32%, Points required for next Level: 344

    Posts
    568
    Thanks
    50
    Thanked 20 Times in 19 Posts

    Re: How does one develop statistical models to compensate for weakness in public data

    Quote Originally Posted by hlsmith View Post
    Another big issue with public datasets are that they are secondary data, so you are limited to what they collected!
    This is a great example. We receive physician claims and billings and a big issue with working with admin or secondary data is context. If a physician bills a procedure for a given patient, mammography for example, is it for Screening or Diagnostic purposes?

    Some other general areas of importance are:

    (1) Privacy and Risk Re-Identification, procedures for control (e.g small cell suppression), risk assessment, the use of encrypted identifiers

    (2) Data linkage - probabilistic or deterministic linkage of patient or person records from multiple sources.

  7. #7
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: How does one develop statistical models to compensate for weakness in public data

    Quote Originally Posted by Jaysin View Post
    Thanks hlsmith and rogojel. Just touching on you point Rogojel regarding Benfords law. The leading digits theory is quite interesting. How would one interpret this however? The data concerns Deaths due to Terrorist Acts, so essentially it is government data that is naturally occuring. What can one do to interpret this? Thanks!
    Just wanted to add my 2 cents: If your outcome variable is number of deaths -- note this is a count type of data -- in which (A) the observations can take only non-negative integer values, and (B) these integers arise from counting rather than ranking. As such, consider the following distributions: Poisson, negative-binomial, or zero-inflated. There is a variety of approaches to model such outcomes.

  8. #8
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: How does one develop statistical models to compensate for weakness in public data


    Quote Originally Posted by Jaysin View Post
    Thanks hlsmith and rogojel. Just touching on you point Rogojel regarding Benfords law. The leading digits theory is quite interesting. How would one interpret this however? The data concerns Deaths due to Terrorist Acts, so essentially it is government data that is naturally occuring. What can one do to interpret this? Thanks!
    Hi,
    that would involve data quality. In this case the first digit is probably not following the Benford law but the second should. This could help identify made-up or estimated data that is presented as "real" measured numbers. It might be useful, or you might have processes that guarantee data integrity, in which case it would not be useful. The IRS is using this afaik.

    regards

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats