Constrained lm() coefficients in R

#1
I want to estimate the time series model Y = a + b*X1 + c*X2 + d*X3 + error by using OLS.

Using R, how can I make it such that the following two constraints hold:
b+c+d = 1
b,c,d >= 0

This has been cross-posted here because my group in my assignment is getting annoyed that I haven't estimated this regression yet. I will answer this question here if/when the other forum participants get there first.
 
Last edited:

Dason

Ambassador to the humans
#2
Can you provide some more details about the background and motivation for this? I also don't necessarily see what would make this a time series model at the moment.
 
#3
Sure, it's a model in finance used to test the results of a trading strategy. So I've created an automated trading strategy and back-tested it over 10 years, giving a time series vector of monthly strategy returns, \(R_{p,t}\). We have 3 factors on the RHS of the regression that are assumed to explain asset returns in general, and we want to see if the INTERCEPT is positive (supposed to measure the skill of the person building the strategy):

\(R_{p,t} = a + b_1 R_{1,t} + b_2R_{2,t} + b_3 R_{3,t} + e_{p,t} \text{ s.t.}\)

\(\bf{1}'\bf{b} = 1, b_i \geq 0 \forall i \in \{1,...,4\}\)

It is standard to impose this restriction so that the coefficients receive the interpretation of being portfolio weights that sum to 1.
 
Last edited:
#4
Okay, what I will do is use this code that I found on stackexchange:

Code:
> library("quadprog");
> X <- matrix(runif(300), ncol=3)
> Y <- X %*% c(0.2,0.3,0.5) + rnorm(100, sd=0.2)
> Rinv <- solve(chol(t(X) %*% X));
> C <- cbind(rep(1,3), diag(3))
> b <- c(1,rep(0,3))
> d <- t(Y) %*% X  
> solve.QP(Dmat = Rinv, factorized = TRUE, dvec = d, Amat = C, bvec = b, meq = 1
Then I will do a pairs, residual (or other) bootstrap to determine the standard errors. Selection of the bootstrap will be determined by what I want to be robust to in my time series data.

HOWEVER, the above code doesn't allow me to have an intercept. Anyone know how to fix this?
 
Last edited:
#5
There is an estimator called “restricted least squares estimator”. If you, Derksheng google on that you will find a number of links. (You can also find it in some econometrics books.) Of course then it will not formally be “OLS” - ordinary least squares. It will be an other estimator. It will be a long equation with lots of matrices, but you can just plug it in in R.

What you can have there are linear restrictions: like beta2 = beta3 or that the sum of betas is equal to 1. I think a usual application is the so called Cobb-Douglas model that is popular among the econometricians.

You can also do F-tests if the restrictions are “binding”.

There are also some restricted estimates in time series with distributed lags estimates.

I could not understand this: “my group in my assignment is getting annoyed that I haven't estimated this regression yet.” If they are annoyed, why don't they estimate it themselves?

About trading: I think it is impossible to make any money by trying to “forecast” or “model” speculative markets. I believe the model is much more complicated than a simple linear model. I think it is full of cointergration and heteroscedasticity like garch (generalized autoregressive conditional heteroscedasticity) and suddenly changing parameter values, and I don't think that it will change gradually so that a Kalman filter will help. I also believe that there are many unobserved variables that in some periods doesn't matter but in other periods are very important. Compare with the “nice weather models”.

There are other people studying time series right now. Maybe Derksheng, you should take the opportunity to – today – ask some really tricky time series questions! :)
 
#6
Nice insight! Both into restricted estimators and finance. Our strategy does beat the market by about 50% over 15 years though, so we're pretty happy. We use rolling panel OLS to select the most informative signals for our basket of 48 indices and then use OLS to do a 1 period ahead forecast at each rebalancing date. We then quality weight based on these expectations.

So you think that a Kalman filter won't or will help? I discussed that with my group but we concluded that it was way overboard (we are coming first with or without it!)
 
#7
“Our strategy does beat the market by about 50% over 15 years though ”
Oh no, no, NO!

Those who search they, will find!

If you search through a lot of series then you will find some relations. That does not mean that they will remain or be stable.

Do you remember that something happened 2007-2008?

Do an Chow-test before 2007 and after 2008 and check if the parameters are the same? Of course they will not.

Do you believe that the “beta” is a fixed parameter? Obviously you don't since you use a “rolling panel OLS”. Maybe that can work like some smoothing or a crude Kalman filter. But I don't think that it will work. In a year like 2008 all of these betas might have flipped.

I think that Bugmans “nightmare data set” in the environmetrics area is much easier than the data in your area.

Now I think that you should do a stochastic specification of the data in your area, with multicolinearity, errors in variables, interdependent systems, the problems mentioned above, omitted variables, varying distributed lag models and many other problems and then ask someone about these problems.

You could ask Dason, who above asked you for clarification. Ask him – today – something like under what conditions with the above stochastic specification will a Kalman filter work? (I believe that someone is having a time series exam today!) :)



But derksheng, (you are a nice person and) I believe that you will learn a lot of statistics on this and hopefully not lose any money. The only profitable with this is to do education about it.
 
#8
We didn't data mine. We agreed on the complete strategy and methodology prior to implementation, applied it and that return popped out. We can even remove some signals from our model and the return increases, but we don't because that would be data mining.

Also I can't really extend it more than it is, as it's just a 20% assignment. The marginal increase in expected marks is not worth the extra effort (given I could reallocate that time to my thesis - it's due in 20 days, and given that we've already got the best assignment in the class).

The parameters definitely won't be stable, but my group insists that I stop trying to go overboard because they want to study for exams.

I will work on this a lot during the holidays though ... Exciting times ahead :D I will include many stock specific signals (ex-ante volatility, idiosyncratic risk, volume, liquidity) and try to deal with some of the issues you've brought up.