If we study model fit on a nonlinear regression model

Y_i=f(z_i,\theta)+\epsilon_i, i=1,...,n,

and in the Gauss-Newton method, the update on the parameter $\theta$ from step $t$ to $t+1$ is to minimize the sum of squares


Can we prove that (why) (part 1) the update is given in the following form:


(part 2) where A^{(t)} is a matrix whose i-th row is f'(z_i,\theta^{(t)})^T, and x^{(t)} is a column vector whose i-th entry is Y_i-f(z_i,\theta^{(t)}).

How to derive those relations? Thanks in advance!