Hi there. Thanks for looking at my post.

I've been working on preprocessing/organizing a data set that I believe I would like to run a regression analysis on.

I want to model my dependent variable based off of two large data sets that have more than 100 independent variables between them. (about 80% of those independent variables are algal species, the rest are ambient environmental variables)

The problem is that most of the independent variables that I think would be good explanatory variables based on biological processes are on a different time scale than the scale that the dependent variable was collected on.

For example:

This first data set contains the date ranges for my dependent variable.
In summary they are consecutive weekly observations from November to March.

Code: 
Data Set 1						
11/4/2003		11/14/2005		11/12/2007		11/16/2009
11/10/2003		11/28/2005		11/26/2007		11/30/2009
11/17/2003		12/6/2005		12/4/2007		12/7/2009
12/1/2003		12/12/2005		12/10/2007		12/14/2009
12/9/2003		12/27/2005		12/17/2007		12/28/2009
12/16/2003		1/3/2006		1/2/2008		1/4/2010
12/29/2003		1/9/2006		1/8/2008		1/11/2010
1/6/2004		1/16/2006		1/14/2008		1/19/2010
1/12/2004		1/23/2006		1/22/2008		1/25/2010
1/19/2004		1/30/2006		1/28/2008		2/1/2010
1/26/2004		2/7/2006		2/5/2008		2/8/2010
2/3/2004		2/13/2006		2/11/2008		2/16/2010
2/9/2004		2/21/2006		2/19/2008		2/22/2010
2/17/2004		2/27/2006		2/25/2008		3/1/2010
2/23/2004				3/4/2008		3/8/2010
		11/13/2006				
11/15/2004		11/27/2006		11/17/2008		
11/29/2004		12/5/2006		11/24/2008		
12/7/2004		12/11/2006		12/1/2008		
12/13/2004		12/18/2006		12/8/2008		
12/27/2004		1/2/2007		12/15/2008		
1/4/2005		1/9/2007		12/29/2008		
1/10/2005		1/16/2007		1/5/2009		
1/17/2005		1/22/2007		1/12/2009		
1/24/2005		1/29/2007		1/20/2009		
1/31/2005		2/6/2007		1/26/2009		
2/8/2005		2/12/2007		2/2/2009		
2/14/2005		2/20/2007		2/9/2009		
2/22/2005		2/26/2007		2/17/2009		
2/28/2005		3/6/2007		2/23/2009		
3/8/2005				3/2/2009
The second data that has the independent variables that I would like to run in a regression with the first data set is as follows:

Each column represents a different sample site.
In general summary, each column has quarterly sample dates with some sites being sample monthly for 12 months.

Code: 
2/19/2003	3/12/2003	1/29/2003	1/15/2003	1/8/2003	3/11/2003
5/21/2003	6/10/2003	4/22/2003	4/16/2003	4/16/2003	6/4/2003
8/19/2003	9/17/2003	7/22/2003	7/16/2003	7/14/2003	9/24/2003
11/11/2003	12/3/2003	10/28/2003	10/22/2003	10/8/2003	10/15/2003
					11/12/2003
					12/11/2003
					
2/18/2004	3/10/2004	1/28/2004	1/21/2004	1/8/2004	1/14/2004
5/17/2004	6/17/2004	4/28/2004	4/21/2004	4/7/2004	2/13/2004
8/25/2004	9/9/2004	7/26/2004	7/21/2004	7/9/2004	3/18/2004
11/1/2004	12/8/2004	10/20/2004	10/13/2004	10/6/2004	4/14/2004
					5/19/2004
					6/23/04
					7/14/2004
					8/11/2004
					9/23/2004
					12/1/2004
					
2/7/2005	3/9/2005	1/19/2005	1/11/2005	1/11/05	3/4/2005
5/12/2005	6/15/2005	2/15/2005	4/13/2005	4/12/2005	06/08/05
8/10/2005	9/14/2005	3/23/2005	7/13/2005	7/6/2005	9/7/2005
11/9/2005	10/19/2005	4/20/2005	10/11/2005	10/05/05	12/13/2005
	11/16/2005	5/18/2005			
	12/15/2005	6/22/2005			
		7/20/2005			
		8/17/2005			
		9/21/2005			
		10/26/2005			
					
2/8/2006	1/11/2006	1/25/2006	1/18/2006	1/4/2006	3/16/2006
5/10/2006	2/15/2006	4/26/2006	4/19/2006	04/05/06	6/7/2006
8/9/2006	3/15/2006	7/26/2006	7/18/2006	7/6/2006	09/06/06
11/8/2006	4/12/2006	10/18/2006	10/11/2006	10/4/2006	12/6/2006
	5/17/2006				
	6/14/2006				
	7/12/2006				
	8/16/2006				
	9/13/2006				
	12/13/2006				
					
2/21/2007	3/14/2007	1/24/2007	1/10/2007	1/3/2007	03/07/07
5/15/2007	6/13/2007	4/19/2007	4/11/2007	4/11/2007	06/06/07
8/15/2007	9/12/2007	7/25/2007	7/18/2007	7/11/2007	09/12/07
11/14/2007	12/13/2007	10/18/2007	10/25/2007	10/3/2007	12/18/07
					
02/13/08	3/11/2008	1/23/2008	1/15/2008	02/11/08	03/13/08
05/15/08	6/10/2008	4/22/2008	4/15/2008	04/08/08	06/17/08
08/13/08	9/9/2008	7/23/2008	7/16/2008	07/02/08	09/10/08
11/12/08	12/12/2008	10/21/08	10/15/2008	10/01/08	12/12/08
					
02/12/09	3/17/2009	01/21/09	01/14/09	1/7/2009	03/04/09
05/14/09	06/10/09	04/22/09	04/15/09	4/14/2009	06/03/09
08/12/09	09/09/09	07/16/09	07/08/09	7/1/2009	09/02/09
11/11/09	12/10/09	10/27/09	10/14/09	10/15/2009	12/14/09
					
02/10/10	03/11/10	01/20/10	01/13/10	01/12/10	03/04/10
05/20/10	06/09/10	04/21/10	04/14/10	4/14/2010	6/16/2010
07/07/10	07/29/10	07/21/10	07/14/10	07/07/10	8/11/2010
08/11/10	09/09/10	08/26/10	8/19/2010	09/14/10	09/01/10
11/10/10	12/08/10	10/20/10	10/13/2010	10/06/10	12/07/10
Both of those data sets should be in nice neat single columns. Shift the middle column dates over to the right.

* Oh one more thing, I have equal sample sites between the two data sets. The only thing that differs between the two data sets is the sampling dates.


I would love it if I could use these two seemingly disparate data sets in a regression. I think it can be done, I just don't have the stats chops to know exactly how to handle/process the data to be statistically rigorous and most importantly, valid.

This is where someone here can hopefully step in and make a recommendation

Here is what I think I should do to make this work. Can you provide your feedback?

I think that to approach bringing these two data sets into as much "harmony" as possible, the first thing I should do is clear out the monthly sample dates from data set 2 such that I create true quarterly data groups.

Then I think I should get rid of the quarterly data from data set 2 that are not relatively close in time to November-March of data set 1 (basically leaving only quarterly data in data set 2 that is September-April). Since this biological process is dependent on ambient conditions; late-spring, summer, and early-fall data is not that important.

This is the point where I get lost in what is statistically appropriate...

I think that I could simply average out the yearly data from data set 1 and turn it into a pseudo-quarterly data set, but I want to avoid that at all costs since the dependent variable is highly seasonal ( it can change significantly over the course of a week) and the weekly data shows that seasonality well.

Another (perhaps hare-brained) idea that I had was to drop the dates completely as they aren't extremely vital to the analysis. This would then create another disproportionality in that I have 15 points per sampling regime/"wrapped year" in data set 1, and only 4 points per sampling regime/year in set 2.

Following that idea, to get data set 2 to match the 15 points in data set 1 could I run multiple imputations to create data for an analysis (statisticians are probably shuddering at "creating data")?

Bottom line is that I have a tremendous amount of data that could explain the dependent variable very well. I just don't know how to massage the data sets such that it can be done with good ol' statistics.

Thanks for your advice/input in advance.