+ Reply to Thread
Results 1 to 5 of 5

Thread: Dataset for statistical analysis

  1. #1
    Points: 9, Level: 1
    Level completed: 17%, Points required for next Level: 41

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Dataset for statistical analysis




    Dear all,
    I work in Rome for the Italian National Health Service (www.iss.it); I am currently working on a European clinic project called EMTICS (www.emtics.eu), I developed both the database and the website to collect clinician data.
    I need an advice on how to export data from SQL database to a file that will be then analyzed by STATA or SPSS.
    A quick summary: the clinician study is about TICS in child. Data is about 1,000 patients all over the Europe. Every patient can do a number of visits and in every visit about 700 variables are recorded. Every visit is also identified by a “VisitType” code (BS=baseline; UP=follow up;FC=final; etc.) and by a “VisitNumber. For example, if we consider the variable “ThroatSwab”, I have the dataset:

    | idPatient | VisitType | VisitNumber | ThroatSwab | ... | other variable | ... |
    | 15 | BS | 01 | 1 | ... | other value | ... |
    | 15 | UP | 01 | 1 | ... | other value | ... |
    | 15 | UP | 02 | 1 | ... | other value | ... |
    | 15 | UP | 03 | 1 | ... | other value | ... |
    | 15 | FC | 01 | 1 | ... | other value | ... |
    | 44 | BS | 01 | 0 | ... | other value | ... |
    | 44 | UP | 01 | 1 | ... | other value | ... |
    | 44 | UP | 02 | 0 | ... | other value | ... |
    | 44 | UP | 03 | 0 | ... | other value | ... |
    | 44 | FC | 01 | 0 | ... | other value | ... |
    ..................................

    I don’t understand why the statistician in London asked for a dataset like the following:

    | idPatient | BS01_ ThroatSwab | UP01_ ThroatSwab | UP02_ ThroatSwab | UP03_ ThroatSwab | FC01_ ThroatSwab | ... | BS01_ ThroatSwab |...|
    | 15 | 1 | 1 | 1 | 1 | 1 | ... | other value | ... |
    | 44 | 0 | 1 | 0 | 0 |0 | ... | other value | ... |
    ......................................................

    And a “DataDictionary” file:

    | Variable_Label | VisitType | VisitNumber | Variable_Name |
    | ThroatSwab | BS | 01 | BS01_ ThroatSwab |
    | ThroatSwab | UP | 01 | UP01_ ThroatSwab |
    | ThroatSwab | UP | 02 | UP02_ ThroatSwab |
    | ThroatSwab | UP | 03 | UP03_ ThroatSwab |
    | ThroatSwab | FC | 01 | FC01_ ThroatSwab |
    ..................................................................
    Where there are different variable names on every different visit type and visit number... Plus a colum “Variable_Label” that group similar variables…

    In other words: I'm not the person that have to analyze the data with SPSS or R: I must simply provide the dataset to a statistician. The first dataset sample indicated is the result of complex processing operations (transposed and other stuff) so I was wondering, as it is relatively simple to make groupings with R or SPSS: is not better to provide the statistician the dataset already just as it is (thus the visit as single statistical unit, than to waste time re-codind the original data? Now the file is about 10,000 lines (the visits: 10 visits per patients per 1000 patients) and 700 columns (variables recorded at each visit), while the statistician wants a file with 1000 lines (the patients) and 7000 columns (different variables for each kind of visit type and number.). I really can't figure out how can he remap multiple observations of the same variable that would have different names for each visit.

    Is there a clear reason because the statistician wants data in this way (for example the software he will use)? Or is it only an unnecessary complication?

    Forgive me for my English, I hope I was clear.
    Thank you in advance,
    Marco

  2. #2
    Omega Contributor
    Points: 38,303, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,993
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Dataset for statistical analysis

    The analyst should easily be able to get the data into their preferred format. I function more like an analyst and may think, hey if I can get someone else to do the step for me, great, I just saved time that I can now allocate toward analytics instead of data steps that may take me longer than you to perform.


    This almost reminds me of creating a relational schema and getting data normalized. Do you ever have to do any of those things?
    Stop cowardice, ban guns!

  3. #3
    Points: 9, Level: 1
    Level completed: 17%, Points required for next Level: 41

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Dataset for statistical analysis

    Quote Originally Posted by hlsmith View Post
    The analyst should easily be able to get the data into their preferred format. I function more like an analyst and may think, hey if I can get someone else to do the step for me, great, I just saved time that I can now allocate toward analytics instead of data steps that may take me longer than you to perform.


    This almost reminds me of creating a relational schema and getting data normalized. Do you ever have to do any of those things?
    Thank you!
    Well I have data normalized. To transform the dataset I have no problem, I can do that easily, but since it is a very long operation (I should re-encode the name of all the variables according to the type and number of the visit, so in the end I would have about 7000 different variables names), I just want to see if it is an unnecessary work that can be avoided or at least delegated directly to the statistician who will then have to analyze the data, though for him it is a task easier to combine data in the dataset with that the "data dictionary " directly on STATA or SPSS

  4. #4
    Omega Contributor
    Points: 38,303, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,993
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Dataset for statistical analysis

    The difficulty of these tasks is fairly comparable if you do them or the statistician. Creating arrays, etc., can be done in STATA or SPSS. However if they do them, they can control the formatting and code for future use and adaptation.
    Stop cowardice, ban guns!

  5. #5
    Points: 16, Level: 1
    Level completed: 31%, Points required for next Level: 34

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Dataset for statistical analysis


    Hi guys, I'm really struggling with the relevant coding for this, I'm using Rstudio. I need a Linear model with Gaussian error structure to find whether a numerical variable (bacteria) is affected by concentrations of sucrose and leucine (categorical variables) using day (categorical variable as a main affect). Any help would be massively helpful.
    Thanks

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats