I am currently enrolled in this course and using a book to teach myself the fundamentals behind learning methods for statistics. I was wondering if there was anyone in this group that could help me understand how multiple linear regression works using matrix calculus.

I am currently aware that to obtain a parameter beta for a single linear regression we use the formula

beta_hat = sum(x*y)/ sum(x*x) ; where x and y are both in vector notation.

I am observing this from a multi variable point of view where we would start with an initial value and the regress it on each x in a vector to obtain the coefficients

γ_lj = <z_l, x_j> / <z_l, z_l>

and the residual vectors

zj = x_j - sum(γ_kj*zk)

and then we regress each of those unstandardized values on y to obtain the standardized values.

obtaining the beta vector

βp = <z_p, y> / <z_p, z_p>

I was hoping that someone with a better understanding could help simplify this for me and clarify why this works. Alternatively, if you have any examples of this calculation being done by hand so that I can see it in action that would also really help.

cheers!