+ Reply to Thread
Results 1 to 4 of 4

Thread: Principal component analysis (PCA) on clustered data

  1. #1
    Points: 1,516, Level: 22
    Level completed: 16%, Points required for next Level: 84

    Posts
    6
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Post Principal component analysis (PCA) on clustered data




    Hi Everyone,
    I am analysing data about physical condition tests on students belonging to different type of schools (general, professional, agricultural). So the data are not independent because students in a same type of school tend to be similar (the intraclass coefficient correlation is about 20%).
    Is it a problem for using a principal component analysis (PCA) ? Is there a method to produce a PCA taking into account a possible "cluster effect" (schools)?
    Thanks in advance

  2. #2
    Probably A Mammal
    Points: 31,087, Level: 100
    Level completed: 0%, Points required for next Level: 0
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,564
    Thanks
    398
    Thanked 618 Times in 551 Posts

    Re: Principal component analysis (PCA) on clustered data

    Not entirely my area, but I see 2 possible approaches

    1. Use PCA on each group separately. If they're structurally different, then I would think PCA would pull out components representative of each group, to which you can then analyze about underlying phenomena across the groups. Though, it does raise the question if you're comparing apples to apples, so to speak, using these component representatives.

    2. Try to include a feature (or set of indicator features; "dummy variable") that represents the groups, but is still stable to using in PCA. I'm not entirely sure of the best approach (e.g., using 0 and 1 or -1 and 1 for each n-1 group?). This method lends itself to other approaches of data transformation, but PCA is definitely not my forte.
    You should definitely use jQuery. It's really great and does all things.

  3. #3
    Points: 3,006, Level: 33
    Level completed: 71%, Points required for next Level: 44

    Posts
    177
    Thanks
    1
    Thanked 29 Times in 29 Posts

    Re: Principal component analysis (PCA) on clustered data

    First of all, there are a couple of ways of dealing with clustered data in PCA. And yes, it will affect analysis, mainly in terms of your effective sample size.

    One quick and easy way of handling this problem if you dont care about the clusters, is to use the intraclass correlation to calculate the effective sample size through a design effect. Then run your PCA on the data.

    But BECAUSE you do care about the groups, what you need to do is test for model invariance. Fit your PCA and then test for model invariance between the groups. Particularly, you want to test for multi-group invariance.

  4. #4
    Points: 1,516, Level: 22
    Level completed: 16%, Points required for next Level: 84

    Posts
    6
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Principal component analysis (PCA) on clustered data


    Thank you both.
    Your answers give me good food for thought.
    I am going to study this.

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats