# [R] Difference between regression lines significant?

#### Fiete

##### New Member
Hi,

I'm a PhD student researching processes of technology maturation.
For this purpose I collected patent data which I want to use to differentiate between 3-6 yet arbitrary maturtiy phases (e.g. technology introduction, growth, maturity, decline). For several different technologies I have yearly patent application count percentages.
By just looking at the bar chart of the data, typically I can identify an increase, a peak and then a decrease with a long tail (see e.g. attached picture and data table below: Appln.Count / Appln.CountTotal = FractionAppln.Count).
Now I want to make an algorithm out of this rather subjective approach. I have implemented a rolling least squares linear regression for 3 year slices in R which you can find below.

Which statistical approach would you suggest for comparing the slopes of any two time slices to find out whether they differ significantly (e.g. 5%-level) so I can assume a maturity phase change?

Fiete

R script:
Code:
for (i in 1976:2010){RegModel.3 <- lm(formula = FractionAppln.Count ~ X.Year, data=CDTimeSeries[CDTimeSeries[,1] >= i & CDTimeSeries[,1] <= i+2,]);print(i);print(summary(RegModel.3));}

data table "CDTimeSeries":
Year Appln.CountTotal Appln.Count FractionAppln.Count
1976 28 0 0%
1977 43 0 0%
1978 99 0 0%
1979 221 0 0%
1980 418 1 0%
1981 1426 0 0%
1982 2057 2 0%
1983 1260 13 1%
1984 939 16 2%
1985 1014 43 4%
1986 978 30 3%
1987 573 14 2%
1988 262 8 3%
1989 716 16 2%
1990 607 26 4%
1991 498 28 6%
1992 569 13 2%
1993 864 57 7%
1994 1405 118 8%
1995 1894 143 8%
1996 2500 235 9%
1997 3002 303 10%
1998 4318 459 11%
1999 4752 469 10%
2000 4973 561 11%
2001 7231 918 13%
2002 8499 1298 15%
2003 10373 1715 17%
2004 15768 2974 19%
2005 18674 2982 16%
2006 18504 3081 17%
2007 16251 2400 15%
2008 13402 1742 13%
2009 10545 1301 12%
2010 9094 1015 11%
2011 7123 528 7%
2012 2519 185 7%

#### chiro

##### New Member
Hey Fiete.

Are your regression models piece-wise linear (they look like it but I need to ask)?

If this is the case then you might want to do a two-sample t-test on the co-efficients where you are testing that the difference between the two is 0 at the 5% level. Basically this is like a two-sample t-test.

You will need to use information about the sampling distribution of the co-efficients (which I believe is a t-distribution under standard linear regression).

So in other words, form H0: B_a = B_b against H1 B_a != B_b and pick the right distribution and test-statistic. If you have enough data, you should be able to use asymptotic normal distribution, but I would look into whether a two-sample t-test is adequate for your needs.

I would also consider using a paired t-test and thinking about whether additional pooling for variance is necessary.