+ Reply to Thread
Results 1 to 4 of 4

Thread: Project Organization -- How do you roll?

  1. #1
    Probably A Mammal
    Points: 14,712, Level: 78
    Level completed: 66%, Points required for next Level: 138
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    1,963
    Thanks
    223
    Thanked 422 Times in 389 Posts

    Project Organization -- How do you roll?



    I wanted to ask this question the other day. Since I'm collaborating on a project now, i thought this might serve a dual purpose then.

    What I'm asking is how do you set up your directory and approach project management? What works best for you? What works best for collaborating?

    I'm not going to say this is the way it has to be done, but I've found it works out very well. With the inclusion of RStudio and RProject files, this has become fantastic for me.

    Code: 
    Project Directory
    data
    documents
    scripts
    results
    This has always worked out for me. This basic structure covers everything I've had to deal with. The interior of these folders may be the same, say, 80% of the time, but it all depends on the project.

    For instance, maybe you have a lot of data. Then you're probably going to need a lot of substructure under your data folder. Maybe you want your data files grouped into file types (binary vs text), or maybe you want them grouped by topic (mortality data, exposure data, etc.).

    An aside here, I'm including in data, all data contained to that project. This doesn't mean you won't require other data, but I see two ways to handle this. Either this project is nested within a larger project folder, as in the project has sub projects, so each of them is an independent project but their parent folder maintains a "global data" folder or something. Alternatively is a non-nested structure. Then maybe you should keep a global data folder for any project in your workspace. I literally do all of my stuff from a "workspace" folder, so in that sense all projects are nested under my workspace and that global data folder serves the purpose I described (e.g., maybe I store GIS data that any project could use).

    Moving on, I recently found a nice way to organize my scripts. I have common scripts just in the scripts folder (no subdirectories). Then I have two sets of other scripts: processing and support. Here processing includes a numbered and named sequence of processing tasks that are essential to the project. The first are always cleaning scripts that pull in the data and get it into production form for, say, regression analyses. Then I may also include here any image books I create to summarize data and other EDA related tasks.

    The support scripts include things like a function.r (in the R case) that simply store all the functions I use in the project. If I really wanted, I could run it and have my projects available, but usually I just keep it as a repository. I copy-paste functions to be used directly into the relevant script. All of my scripts clean up their environment when done, so those functions will be removed (so there's no cross-script calling problems).

    This folder also includes things like in my last project, I created linking tables. For example, I have a function called vlookup that works like Excel's vlookup command. This basically lets me recode something with

    Code: 
    x = transform(x, category = vlookup(category, recodeTable, 3))
    Here I just recoded the category field with a linking table using the 3rd column as the recode. It defaults to 2. It's a nice wrapper for match. The linking tables are an Rdata file in my data folder, but I wanted a way to reproduce them or adjust them as needed. That's all contained in a support script in the support folder. I'm sure you can imagine other instances where a function supports the project in these ways. Maybe you have other schemas for how to organize your scripts folder.

    The documents folder is obvious. This could include literature or just documentation. I usually put a README in my project directory (and sometimes in other folders to talk about its contents), but more thorough and detailed stuff (e.g., recording your hours spent on the project) can go in this folder.

    The results folder is also self-explanatory. It stores all of your output that isn't necessarily going to be reused in the project. I usually put all of my images into a pdf (image book), but when I don't, I'll usually create an images folder here and drop images in that folder. The equivalent to the image book would be to create a folder in the images folder for all those related images. Much easier to avoid both folders and have one pdf!

    Closing Remarks

    Like I said, none of this is set in stone. I just have found this to be a good way to manage projects. It's efficient and organized. It lets me really understand a project when I go back to it even weeks after not touching it and pick right back up pretty well. Even better is when you're coming back to such a project after months or years!

    As I alluded to, there are ways to adjust things. For instance, maybe you want to include a separate folder for your literature review outside of the documents folder or maybe you want to create separate script folders for different languages. Maybe you should include a library or bin folder to store libraries your programs require (thinking of C programming here) or applications to be used in your project. You might have a considerable project that you want to include a sub-project. I talked about this nested structure before, but how does that fit into an actual project (not just sub-directories like those under my workspace)?

    Do you think there's better ways to approach this? What things do you find important in managing your projects? Share here so we can all learn to be more productive in what we do! My experiences and approaches are due to my experience and the tools I use. Certainly other people doing other things using other tools may find my approach lacking in some areas or need refinement in others.

    Discuss!

  2. #2
    TS Contributor
    Points: 6,685, Level: 53
    Level completed: 68%, Points required for next Level: 65
    Lazar's Avatar
    Location
    Sydney
    Posts
    673
    Thanks
    111
    Thanked 167 Times in 152 Posts

    Re: Project Organization -- How do you roll?

    I wrote myself a project creation function which does what I like. I know there was one avaliable on CRAN but I hated it as it results in a number of useless folders. Here is mine:
    Code: 
    MyProject<- function (project.name = "Test", dir= c("libraries", "dataPrep", "data", 
                                                        "analysis", "graphics", "sweave",
                                                        "workspace"), parent = getwd()){
    #Function to create folders for project and to add sweave and example 
    #.R files to appropriate folders
    #Create files copies example files from a general directory in the wd to new project
    #The copied files are specific to my concerns
    #In this version of the code they need to be saved in the same parent dir as
    #that specified in the MyProject function.
    #Eventually the files will be downloaded from the net.
        parent<- gsub('\\\\', '/', parent)
        dir.create(paste(parent, project.name, sep="/"))
        for (i in 1:length(dir)){
            dir.create(paste(parent, project.name, dir[i], sep="/"))
      }
       CreateFiles()
    }
    
    CreateFiles<- function(){
    #Files copied here are specific to my projects.
    #Some work needs to be done to make the function general to others.
    #e.g. Downloading sweave and example sweave files from online will be useful.
      for (i in 1:length(dir))
      if(dir[i]=='sweave'){
        download.file('http://www.biostat.jhsph.edu/~rpeng/ENAR2009/Sweave.sty',
                  paste(parent, project.name, 'sweave', 'sweave.sty', sep='/'))
      }else{cat('No files copied for:', dir[i], '\n')
           }
    }
    Last edited by Lazar; 08-23-2012 at 12:57 AM. Reason: Fixed code

  3. #3
    FormerlyKnownAsRaptor
    Points: 24,983, Level: 95
    Level completed: 64%, Points required for next Level: 367
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,218
    Thanks
    916
    Thanked 562 Times in 509 Posts

    Re: Project Organization -- How do you roll?

    This is nice Lazar. Is Sweave getting replaced by knitr?

    For some reason I get this error (haven't debugged myself yet):
    Code: 
    > CreateFiles<- function(){
    + #Files copied here are specific to my projects.
    + #Some work needs to be done to make the function general to others.
    + #e.g. Downloading sweave and example sweave files from online will be useful.
    +   for (i in 1:length(dir))
    +   if(dir[i]=='dataPrep'){
    +      file.copy(paste(parent,"/GeneralRlibrary/DataPrep.R", sep='/'),
    +                      paste(project.name, dir[i], sep='/'))
    +   }else if(dir[i]=='sweave'){
    +     download.file('http://www.biostat.jhsph.edu/~rpeng/ENAR2009/Sweave.sty',
    +               paste(project.name, 'sweave', 'sweave.sty', sep='/'))
    +     file.copy(paste(parent,"/GeneralRlibrary/sweave.rnw", sep='/'),
    +               paste(project.name, dir[i], sep='/'))
    +   }else{cat('No files copied for:', dir[i], '\n')
    +        }
    + }
    > MyProject()
    Error in dir[i] : object of type 'closure' is not subsettable
    Though it seems to have pumped out what it's supposed to. I think it could be even nicer to have certain scripts saved in certain directories too, like a premade sweave/knitr script thats a .Rnw file.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  4. #4
    TS Contributor
    Points: 6,685, Level: 53
    Level completed: 68%, Points required for next Level: 65
    Lazar's Avatar
    Location
    Sydney
    Posts
    673
    Thanks
    111
    Thanked 167 Times in 152 Posts

    Re: Project Organization -- How do you roll?


    I agree tr about adding more premade files. I will debug and see what is happening but the folder structure should be there and working fine. And yes I will replace sweave with knitr.

    EDIT: Code fixed above.

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats