+ Reply to Thread
Results 1 to 15 of 15

Thread: Proof of the day

Hybrid View

  1. #1
    TS Contributor
    Points: 6,786, Level: 54
    Level completed: 18%, Points required for next Level: 164

    Location
    Sweden
    Posts
    524
    Thanks
    44
    Thanked 112 Times in 100 Posts

    Proof of the day

    In this thread we post one (or more if you can't wait) proof a day. I'll start by proving the that

    V[b|X]=\sigma^2(X'X)^{-1}

    in the linear regression model. We can write the beta estimator as

    (X'X)^{-1}X'y=(X'X)^{-1}X'(X\beta+\epsilon)=(X'X)^{-1}X'X\beta+(X'X)^{-1}X'\epsilon=\beta+(X'X)^{-1}X'\epsilon.

    Then we have that

    V[b|X]=E[(b-\beta)(b-\beta)'|X]=E[(\beta+(X'X)^{-1}X'\epsilon-\beta)(\beta+(X'X)^{-1}X'\epsilon-\beta)'|X]=E[((X'X)^{-1}X'\epsilon)((X'X)^{-1}X'\epsilon)'|X]=E[(X'X)^{-1}X'\epsilon\epsilon'X(X'X)^{-1}|X]=(X'X)^{-1}X'E[\epsilon\epsilon'|X]X(X'X)^{-1}=(X'X)^{-1}X'\sigma^2IX(X'X)^{-1}=\sigma^2(X'X)^{-1}X'X(X'X)^{-1}=\sigma^2(X'X)^{-1}.

    Where

    E[\epsilon\epsilon'|X]=\sigma^2I

    follows from one of the assumptions of the classical linear model: the spherical disturbances assumption.

  2. The Following 4 Users Say Thank You to Englund For This Useful Post:

    anders.bjorn (11-10-2016), bryangoodrich (01-18-2014), M!ss Moon (02-21-2014), spunky (01-18-2014)

  3. #2
    TS Contributor
    Points: 22,472, Level: 93
    Level completed: 13%, Points required for next Level: 878
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 538 Times in 432 Posts

    Re: Proof of the day

    me likes this. most of my proofs would come from the field of psychometrics or quantitative psychology though (mostly factor analysis and stuctural equation modelling).

    here i'm doing the (rather simple) proof of how the linear factor analysis model can be parameterised as a covariance structure model. it's relevant because as a linear factor model it is unsolvable, but as a covariance structure model it is possible to obtain parameter estimates.

    let the obseverd score x be defined as the linear factor model x = \Lambda F+\epsilon_{i} since it is known that (in the case of multivariate normality) E(xx')=\Sigma it trivially follows that:

    xx' = (\Lambda F+\epsilon)(\Lambda F+\epsilon)'
    xx' = (\Lambda F+\epsilon)(F'\Lambda' + \epsilon')
    xx' = \Lambda FF'\Lambda' + \epsilon F'\Lambda'+ \Lambda F \epsilon' + \epsilon\epsilon'

    so taking the expectation of both sides:

    E(xx') = E(\Lambda FF'\Lambda') + 0 + 0 + E(\epsilon\epsilon')

    which happens because the erros are random and assumed uncorrelated with the Factors and estimated loadings. Now by linearity of expectation and substituting the covariance matrix of the Factors and of the errors we can see that:

    E(xx') = \Lambda E(FF')\Lambda' + E(\epsilon\epsilon')

    \Sigma=\Lambda \Phi \Lambda' + \Psi

    which is known as the fundamental equation of Factor Analysis.
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  4. The Following 2 Users Say Thank You to spunky For This Useful Post:

    bryangoodrich (01-18-2014), Englund (01-18-2014)

  5. #3
    TS Contributor
    Points: 6,786, Level: 54
    Level completed: 18%, Points required for next Level: 164

    Location
    Sweden
    Posts
    524
    Thanks
    44
    Thanked 112 Times in 100 Posts

    Re: Proof of the day

    Okay, since this day is soon over (at least according to Swedish time) and no one posted a proof yet today, I'll post another proof. I'll give a very simple, and possibly boring, proof this time. I'll prove that \bar{x} is the value that minimizes the sum \sum_{i=1}^n{(x_i-a)^2} (1).

    By taking the first derivative with respect to a and setting it equal to zero, we get \sum_{i=1}^n{-2(x_i-a)}=0 \Leftrightarrow -2\sum_{i=1}^n{x_i}+2na=0 \Leftrightarrow \sum_{i=1}^n{x_i}=na \Leftrightarrow \bar{x}=a.

    By checking the second order condition we see that it's equal to 2n, which is always positive, so now we know that \bar{x} is at least a local minimum. By investigating (1) it is easily seen that it is also a global minimum.

  6. #4
    Points: 1,329, Level: 20
    Level completed: 29%, Points required for next Level: 71

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Proof of the day

    Nice proof! Simple and fun! =)

  7. #5
    Devorador de queso
    Points: 95,995, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,938
    Thanks
    307
    Thanked 2,630 Times in 2,246 Posts

    Re: Proof of the day

    I prefer the version that doesn't require the use of calculus.

    \sum (x_i - a)^2 = \sum (x_i - \bar{x} + \bar{x} - a)^2 = \sum (x_i - \bar{x})^2 + (\bar{x} - a)^2 + 2(x_i - \bar{x})(\bar{x} - a)

    =  \sum (x_i - \bar{x})^2 + \sum(\bar{x} - a)^2 + 2\sum(x_i - \bar{x})(\bar{x} - a)

    Now consider the last summation. Note that in the sum both a and \bar{x} are constant so we can pull them out

    = 2(\bar{x} - a) \sum (x_i - \bar{x})
    We know that that sum is equal to 0 so this shows the third summation disappears.

    We are left with

    \sum (x_i - a)^2 = \sum (x_i - \bar{x})^2 + (\bar{x} - a)^2

    The first summation we can't control and the second sum is always non-negative so the minimum would occur if we can make it equal to 0 - which happens when a=\bar{x}.

    Now clearly I need a few more details to make it more rigorous but I like that version a little bit more because it also gives hints at what we do in ANOVA when decomposing the sums of squares.
    Last edited by Dason; 01-19-2014 at 11:08 PM.
    I don't have emotions and sometimes that makes me very sad.

  8. The Following 3 Users Say Thank You to Dason For This Useful Post:

    bryangoodrich (01-20-2014), Englund (01-20-2014), Jake (01-19-2014)

  9. #6
    TS Contributor
    Points: 22,472, Level: 93
    Level completed: 13%, Points required for next Level: 878
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 538 Times in 432 Posts

    Re: Proof of the day

    a while ago (before Englund became an MVC) I posted a proof about another result in factor analysis. I thought it would be nice to resurrect it (briefly) and add it here to our small (but growing) compendium of proofs. the original thread is here

    http://www.talkstats.com/showthread....OOF?highlight=

    and the proof goes like this:

    Let \bf{S} be a covariance matrix with eigenvalue-eigenvector pairs (\lambda_1, \mathbf{e}_1), (\lambda_2, \mathbf{e}_2), ..., (\lambda_p, \mathbf{e}_p), where
    \lambda_1 \ge \lambda_2 \ge ... \ge \lambda_p. Let m<p and define:

    \bf{L} = \{l_{ij}\} = \left[\sqrt{\lambda_1 }\mathbf{e}_1\  |\  \sqrt{\lambda_2} \mathbf{e}_2\ |\ ...\  |\ \sqrt{\lambda_m} \mathbf{e}_m  \right]

    and:

    \ 
\mathbf\Psi = 
\left(
 \begin{array}{cccc}
\psi_1 & 0 & ... & 0 \\
0 & \psi_2 & ... & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & ... & \psi_p \\
\end{array} 
\right)
\text{ with } \psi_i = s_{ii} - \sum_{j=1}^{m} l_{ij}^2

    Then, PROVE:

    \text{Sum of squared entries of } (\mathbf{S} - (\mathbf{LL'} + \mathbf{\Psi})) \le \lambda_{m+1}^2 + \cdots + \lambda_p^2

    Spunky's attempt of a proof:

    By definition of \psi_i, we know that the diagonal of (\mathbf{S} - (\mathbf{LL'} + \mathbf{\Psi})) is all zeroes. Since
    (\mathbf{S} - (\mathbf{LL'} + \mathbf{\Psi}))) and (\mathbf{S} - \mathbf{LL'}) have the same elements except on the diagonal, we know that

    \text{(Sum of squared entries of }  (\mathbf{S} - (\mathbf{LL'} + \mathbf{\Psi})))  \leq \text{ Sum of squared entries of } (\mathbf{S} - \mathbf{LL'})

    Since \mathbf{S} = \lambda_1 \mathbf{e}_1 \mathbf{e}'_1 + \cdots + \lambda_p \mathbf{e}_p \mathbf{e}'_p
    and \mathbf{LL'} = \lambda_1 \mathbf{e}_1 \mathbf{e}'_1 + \cdots + \lambda_m \mathbf{e}_m \mathbf{e}'_m, then it follows that
    \mathbf{S} - \mathbf{LL'} = \lambda_{m+1} \mathbf{e}_{m+1} \mathbf{e}'_{m+1} + \cdots + \lambda_p \mathbf{e}_p \mathbf{e}'_p

    Writing it in matrix form, this is saying \mathbf{S} - \mathbf{LL'} = \mathbf{P}_2 \mathbf{\Lambda}_2 \mathbf{P}'_2 where
    \mathbf{P}_2 = [ \mathbf{e}_{m+1} | \cdots | \mathbf{e}_p ] and \mathbf{\Lambda}_2 = Diag(\lambda_{m+1}, \cdots, \lambda_{p})

    Then, the following is true:

    \text{Sum of squared entries of }(\mathbf{S}- \mathbf{LL'})= \text{tr}((\mathbf{S} - \mathbf{LL'}) (\mathbf{S} - \mathbf{LL'})')=

    \text{tr} (( \mathbf{P}_2 \mathbf{\Lambda}_2 \mathbf{P}'_2)( \mathbf{P}_2 \mathbf{\Lambda}_2 \mathbf{P}'_2)')=\text{tr}( \mathbf{P}_2 \mathbf{\Lambda}_2\mathbf{\Lambda}_2 \mathbf{P}'_2)

    tr(\mathbf{\Lambda}_2\mathbf{\Lambda}_2)=\lambda_{m+1}^2 + \cdots + \lambda_p^2.

    All the \bf{P}_2 disappear because by the definition of \bf{P}_2 we know that \bf{P}_2 '\bf{P}_2=\bf{I}
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  10. The Following User Says Thank You to spunky For This Useful Post:

    Englund (01-20-2014)

  11. #7
    TS Contributor
    Points: 6,786, Level: 54
    Level completed: 18%, Points required for next Level: 164

    Location
    Sweden
    Posts
    524
    Thanks
    44
    Thanked 112 Times in 100 Posts

    Re: Proof of the day

    Quote Originally Posted by spunky View Post
    a while ago (before Englund became an MVC)
    Time wasn't even defined before I became MVC, so that's per definition impossible
    Quote Originally Posted by spunky View Post
    I posted a proof about another result in factor analysis. I thought it would be nice to resurrect it (briefly) and add it here to our small (but growing) compendium of proofs.

    and the proof goes like this:
    Very nice. If you keep posting stuff on FA I'll be forced to get more familiar with it, which is good

  12. #8
    TS Contributor
    Points: 22,472, Level: 93
    Level completed: 13%, Points required for next Level: 878
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 538 Times in 432 Posts

    Re: Proof of the day

    Quote Originally Posted by Englund View Post
    If you keep posting stuff on FA I'll be forced to get more familiar with it, which is good
    i don't quite understand why but pretty much NO ONE in the Statistics world even touches on Factor Analysis. when it comes to dimension reduction techniques almost all of the undergrad stats textbooks i've seen that deal with intro to multivariate analysis stop at principal components. there may be like some small subsection in some namless appendix that says something about Factor Analysis... but that's it!

    WHY!??!
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  13. #9
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Proof of the day

    Here's a link to geometrically based proof I posted a few months ago in another thread. It is about constraints among sets of correlation coefficients.

    http://www.talkstats.com/showthread....-r_xy-and-r_yz

    In the thread I just call this an "argument" but if Dason's thing counts as a proof then I think mine does too
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  14. #10
    Points: 46, Level: 1
    Level completed: 92%, Points required for next Level: 4
    gjay822's Avatar
    Location
    UK
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Proof of the day

    Here, it is under MATH 31 so called statistics. I am still having problem solving in this field.

  15. #11
    Devorador de queso
    Points: 95,995, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,938
    Thanks
    307
    Thanked 2,630 Times in 2,246 Posts

    Re: Proof of the day

    We shouldn't let this thread get buried. I'm gonna sticky it.
    I don't have emotions and sometimes that makes me very sad.

  16. #12
    Devorador de queso
    Points: 95,995, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,938
    Thanks
    307
    Thanked 2,630 Times in 2,246 Posts

    Re: Proof of the day

    Somebody post a proof. Go!
    I don't have emotions and sometimes that makes me very sad.

  17. #13
    TS Contributor
    Points: 22,472, Level: 93
    Level completed: 13%, Points required for next Level: 878
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 538 Times in 432 Posts

    Re: Proof of the day

    Quote Originally Posted by Dason View Post
    Somebody post a proof. Go!
    *YOU* should do one!
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  18. #14
    Points: 43, Level: 1
    Level completed: 86%, Points required for next Level: 7

    Posts
    6
    Thanks
    0
    Thanked 4 Times in 2 Posts

    Re: Proof of the day

    Nice thread so I make my debut here: The derivation of the Ridge-Estimator in the linear Regression Model.

    \mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\mathbf{u}, \quad \mathbf{u} \sim N_n(\mathbf{0},\sigma_u^2\mathbf{I}_n)

    with strong correlation patterns among the vectors within the data matrix \mathbf{X} \in Mat_{n,p}(\mathbb{R}). The problem with multicollinearity is that single components within the vector of parameters \boldsymbol{\beta} \in \mathbb{R}^k can take absurdly large values. So the general idea is to restrict the length of said vector to a prespecified positve real number. Let this restriction been noted by \left\| \boldsymbol{\beta} \right\|_2^2=c, whereas \left\|\cdot \right\|_2 is just the euclidian norm on \mathbb{R}^n.

    Eventually one faces the restricted least squares problem

    Q_n(\boldsymbol{\beta},\lambda) := \left\|\mathbf{y}-\mathbf{X}\boldsymbol{\beta}\right\|_2^2 + \lambda (\left\|\boldsymbol{\beta}\right\|_2^2-c) \rightarrow \min_{\boldsymbol{\theta} \in \mathbf{\Theta}}

    whereas the Lagrange parameter is assumed to be positive and \mathbf{\Theta} \subseteq \mathbb{R}^k \times \mathbb{R}_{>0} is the associated parameter space. The optimization problem is equivalent to

    Q_n(\boldsymbol{\beta},\lambda):= (\mathbf{y}-\mathbf{X}\boldsymbol{\beta})'(\mathbf{y}-\mathbf{X}\boldsymbol{\beta}) + \lambda (\boldsymbol{\beta}'\boldsymbol{\beta}-c) \rightarrow \min_{\boldsymbol{\theta} \in \mathbf{\Theta}}

    Taking the derivative with respect to \boldsymbol{\beta} yields

    \displaystyle \frac{\partial}{\partial \boldsymbol{\beta}} Q_T(\boldsymbol{\beta},\lambda) = -2\mathbf{X}'(\mathbf{y}-\mathbf{X}\hat{\boldsymbol{\beta}})+ 2\lambda \hat{\boldsymbol{\beta}}

    This leads to the first order condition (note you can set the hats already due to the fact that the potential minimizers of the problem above are already given as an implicit function)

    -\mathbf{X}'\mathbf{y} + \mathbf{X}'\mathbf{X}\hat{\boldsymbol{\beta}} + \lambda \hat{\boldsymbol{\beta}} = \mathbf{0}

    Arranging terms leads to the modified normal equations

    (\mathbf{X}'\mathbf{X}+ \lambda \mathbf{I}_k)\hat{\boldsymbol{\beta}} = \mathbf{X}'\mathbf{y}

    Since \mathbf{X}'\mathbf{X} is at least positive semi definite and \lambda \mathbf{I}_k is positive definite one yields that*

    det(\mathbf{X}'\mathbf{X}+ \lambda \mathbf{I}_k) \geq det(\mathbf{X}'\mathbf{X})+det(\lambda\mathbf{I}_k) = det(\mathbf{X}'\mathbf{X}) + \lambda^n >0

    so that (\mathbf{X}'\mathbf{X}+ \lambda \mathbf{I}_k) is an invertible matrix even if the data matrix is of less than full column rank. This finally yields the ridge estimator in its known form

    \hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X}+ \lambda \mathbf{I}_k)^{-1}\mathbf{X}'\mathbf{y}

    Also this is the unique global minimizer of Q_n due to the fact that the problem under consideration is just a sum auf convex functions and \hat{\boldsymbol{\beta}} is the only local minimizer, so one doesn't need to check the second order conditon and the associated hessians.

    *One can find a good proof for that inequality in Magnus, J.R. & Neudecker, H. (1999). Matrix Differential Calculus. Wiley and Sons on page 227 theorem 28.
    Last edited by René; 09-17-2014 at 01:42 PM. Reason: still not getting the math mode + butchering english as usual

  19. The Following 2 Users Say Thank You to René For This Useful Post:

    Englund (12-21-2014), spunky (09-17-2014)

  20. #15
    Super Moderator
    Points: 13,151, Level: 74
    Level completed: 76%, Points required for next Level: 99
    Dragan's Avatar
    Location
    Illinois, US
    Posts
    2,014
    Thanks
    0
    Thanked 223 Times in 192 Posts

    Proof of the day

    The Pearson product-moment coefficient of correlation can be interpreted as the cosine of the angle between variable vectors in n dimensional space. Here, I will show the relationship between the Pearson and Spearman (rank-based) correlation coefficients for the bivariate normal distribution through the following series:

    \sum_{n=1}^{\infty }\frac{\cos nx}{n}.

    If we let z=\cos x+i\sin x, then

    \sum_{n=1}^{m}y^{n-1}z^{n}=\frac{z\left \{ 1-\left ( yz \right )^{m} \right \}}{1-yz}

    where it follows for \left | y \right |<1,

    \sum_{n=1}^{\infty }y^{n-1}\left ( \cos nx+i\sin nx \right )=\frac{\cos x+i\sin x}{1-y\cos x-yi\sin x}

    =\frac{\left ( \cos x-y \right )+i\sin x}{1-2y+y^{2}}, so that

    \sum_{n-1}^{\infty }\cos nx=\frac{\cos x-y}{1-2y\cos x+y^{2}}.

    This series is uniformly convergent for all values of y and for \left | y \right |\leq p<1. Hence, integrating with respect to y, where 0<y<1 gives

    \sum_{n=1}^{\infty }y^{n}\frac{\cos nx}{n}

    =\int_{0}^{y}\frac{\cos x-t}{1-2t\cos x+t^{2}}dt

    =-\frac{1}{2}\ln \left ( 1-2y\cos x+y^{2} \right ).

    Suppose that x is neither zero nor a multplei of 2\pi.

    Then the series \sum_{n=1}^{\infty }\frac{\cos nx}{n} is convergent, and, for 0\leq y\leq 1, y^{n}, is positive, monotonic, decreasing and bounded. As such the series:

    \sum_{n=1}^{\infty }y^{n}\frac{\cos nx}{n}

    is therefore uniformly convergent on the interval 0\leq x\leq 1.

    Subsequently letting x\rightarrow 1, then it follows that if x is neither 0 nor a multiple of 2\pi we have

    \sum_{n=1}^{\infty }\frac{\cos nx}{n} =-\frac{1}{2}\ln \left ( 2-2\cos x \right )

    =-\ln \left ( 2\sin \frac{1}{2} x\right ).

    Setting x=\frac{\pi }{3}r_{s} and exponentiating e^{-1} gives the relationship (for large sample sizes) between the Pearson and Spearman correlation coefficients as:

    r_{p}=2\sin\left ( \frac{\pi }{6}r _{s}\right )

    for the bivariate normal distribution.

  21. The Following 2 Users Say Thank You to Dragan For This Useful Post:

    Englund (10-04-2015), spunky (02-22-2016)

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats