Hi!

I'm learning about the different penalized regression methods and I thought I'd stick with Ridge Regression because an analytical expression can be derived for the estimator. I've seen a few different versions of the derivation, however they all use matrix calculus, something I'm not too familiar with. Therefore, I thought I'd try to derive the estimator using a simple model and ordinary calculus. Was wondering if someone could comment on whether I've done this properly:

Setup:

Multiple linear regression model with 2 predictor (continuous) variables: B1, B2

No intercept in the model (assume centered predictors)

y = B1*x1 + B2 * x2 + e

The aim is to minimize the residual sum of squares (RSS) subject to the constraint that the sum of the squared coefficients are 'penalized' by a parameter, Lambda.

L(B1, B2, lamda) = RSS + lambda * (B1^2 + B2^2)

L(B1, B2, lamda) = Sum [ y - B1*x1 - B2*x2 ] + lambda * (B1^2 + B2^2)

if we take the partial derivative of one of the slopes, say B1, I arrive at this as the ridge estimate:

dL(B1, B2, lamda)/dB1 = -2 Sum [ y - B1*x1 - B2*x2 ] * -x1 + 2*lambda*B1

If i set this to zero and solve for B1, I get:

B1 (Ridge) = Sum (y*x1) - B2 * Sum (x1 * x2) / Sum (x2^2) + lambda

Is anyone able to verify this? TIA

I'm learning about the different penalized regression methods and I thought I'd stick with Ridge Regression because an analytical expression can be derived for the estimator. I've seen a few different versions of the derivation, however they all use matrix calculus, something I'm not too familiar with. Therefore, I thought I'd try to derive the estimator using a simple model and ordinary calculus. Was wondering if someone could comment on whether I've done this properly:

Setup:

Multiple linear regression model with 2 predictor (continuous) variables: B1, B2

No intercept in the model (assume centered predictors)

y = B1*x1 + B2 * x2 + e

The aim is to minimize the residual sum of squares (RSS) subject to the constraint that the sum of the squared coefficients are 'penalized' by a parameter, Lambda.

L(B1, B2, lamda) = RSS + lambda * (B1^2 + B2^2)

L(B1, B2, lamda) = Sum [ y - B1*x1 - B2*x2 ] + lambda * (B1^2 + B2^2)

if we take the partial derivative of one of the slopes, say B1, I arrive at this as the ridge estimate:

dL(B1, B2, lamda)/dB1 = -2 Sum [ y - B1*x1 - B2*x2 ] * -x1 + 2*lambda*B1

If i set this to zero and solve for B1, I get:

B1 (Ridge) = Sum (y*x1) - B2 * Sum (x1 * x2) / Sum (x2^2) + lambda

Is anyone able to verify this? TIA

Last edited: