If I have n training samples and a linear model f(x,θ) = θ_0 + θ_1 * x_1 + θ_2 * x_2

Now going over to a model with three regressors f(x,θ) = θ_0 + θ_1 * x_1 + θ_2 * x_2 + θ_3 * x_3

How does the training error, test error and coefficients change if..

1) ... x_3 = x_1 + 0.2 * x_2?

2) ... x_3 is a random variable independent from y?

3) ... x_3 = x_1 ^ 2?

point 1) inducing a new regressor x_3 which is linearly dependent on the other regressors (it is a linear combination of two or more independent variables) is considered "perfect multicollinearity", so x_3 is perfectly correlated with a combination of the other two independent variables.

coefficients: The matrix would be singular, therefore not invertible. The OLS estimators do not exist. Since one variable is a linear combination of the other, it doesn’t actually provide any extra information to solve the regression. So, the resulting coefficient will be statistically insignificant for either one of the linearly dependent coefficients.

training & test error: Multicollinearity in the training set should only reduce predictive performance (increase the test error) in the test dataset if the covariance between variables in the training and test datasets is different. If the covariance structure (and consequently the multicollinearity) is similar in both training and test datasets, then it does not pose a problem for prediction.

point 2) if x_3 is independent from y, it simply has no predictive power, so the coefficients as well as training and test error will remain unchanged.

point 3) inducing a squared term makes it possible to account for nonlinearity

coefficients: if including the squared term indeed fits the data better, the coefficients will change to fit the data better and be more significant

training & test error: if including the squared term indeed fits the data better, then the training and test error will be smaller compared to the model without the squared term (assuming that both datasets follow non-linearity)

Now, how do things change if we were using Lasso Regression?

point 1) I don't know..

point 2) nothing changes because x_3 is independent of y

point 3) due to the squaring, x_3 will be a large number, which gets penalized by lasso moreso than the other terms.