My Last Memento of Linear Algebra - Starting with Triangularization
ABSTRACT. Schur triangularization is a powerful tool in linear algebra: it implies spectral decompostion, Cayley-Hamilton, removal rule and Jordan canonical form over the complex number field. With the help of algebraic closure, ordinary triangularization serves to generalize the previous results. The spirit of simultaneous triangularization is explored through problem-solving, and the relevant theorems from Lie algebra are recalled. Finally, we consider the triangularization of matrices over PID. The main references are Linear Algebra (Fourth Edition) by Stephen H. Friedberg, Arnold J. Insel and Lawrence E. Spence, which is the textbook chosen by my professor when I was studying linear algebra, and Problems and Theorems in Linear Algebra by Viktor V. Prasolov, which contains a lot of fascinating results and is available online: [http://staff.math.su.se/mleites/books/prasolov-1994-problems.pdf].
Corollaries of Schur Triangularization and Ordinary Triangularization
TO BEGIN WITH... The Generalized Schur Triangularization is in this blog: [https://www.cnblogs.com/chaliceseven/p/17094280.html]. The Ordinary Triangularization is in this blog : [https://www.cnblogs.com/chaliceseven/p/17094288.html].
Corollary 1: The Spectral Theorem
From the proof of I), we see that for any linear operator on a nonzero finitely-dimensional complex inner product space is unitarily triangonalizable. This is Schur's theorem. Using the fact in III), it follows that a linear operator on a nonzero finitely-dimensional complex inner product space is unitarily diagonalizable iff it is normal. Moreover, if a linear operator on a finitely-dimensional real inner product space is self-adjoint (aka Hermitian),then it is normal and hence has the matrix representation
with respect to some orthonormal basis. But the matrix is self-adjoint, and thus it is actually diagonal. Thus we obtain the spectral theorem: Under the finite-dimensional assumption, any real self-adjoint operator (resp. complex normal operator) is a linear combination of real (resp. complex) orthogonal projections. \(\blacksquare\)
Remark 1 (Schur's Inequality). Let \(A\in M_{n\times n}(\mathbb{C})\) and \(\lambda_i\ (i=1,\cdots,n)\) be the eigenvalues of \(A\). Denote by \(\|\cdot\|_F\) the Frobenius norm. By Schur's theorem, we have
with the equality holds iff \(A\) is normal. This is Schur's inequality. In fact, we can derive the following equality:
Therefore, every normal operator has the minimal Frobenius norm in its similarity class. (Note that two similar normal operators are automatically unitarily equivalent and hence have the same Frobenius norm.) Conversely, if a matrix minize the Frobenius norm in its similarity class, then it must be normal.
Remark 2 (Disgression: Low-Rank Approximation). Let \(A\in M_{m\times n}(\mathbb{C})\) and \(\sigma_1\ge \cdots \ge \sigma_k\ge \cdots\ge \sigma_r\ge 0\) be the nonzero eigenvalues of \(A\), where \(r=\text{rank}(A)\) and \(1\le k\le r\). Then we have
(Also, note that \(\|A\|_F^2=\sum_{i=1}^{r}|\sigma_i|^2\).) If \(A=U\Sigma V^*\) is a SVD such that
then \(\widehat{A}=U\widehat{\Sigma}V^*\) achieves the infimum, where
Moreover, if \(\sigma_k\neq \sigma_{k+1}\), then the minimizer is unique. This is Eckart–Young–Mirsky theorem for Frobenius norm. In fact, \(\widehat{A}\) is also the best rank-\(k\) approximation to \(A\) in the spectral norm, and
The proof can be find on Wikipedia.
Corollary 2: Cayley-Hamilton and Removal Rule
Proof of Cayley-Hamilton. By Schur's theorem, it suffices prove Cayley-Hamilton for any complex upper triangular matrix \(A=(a_{ij})_{n\times n}\).
Note that the characteristic polynomial of \(A\) is \(f(t)=\prod_{k=1}^{n}(t-a_{kk})\) and hence \(f(A)=\prod_{k=1}^{n}(A-a_{kk}I)\). We prove by induction that the first \(l\) column of the matrix \(B_{l}=\prod_{k=1}^{l}(A-a_{kk}I)\) are all \(0\), for all \(1\le l\le n\), and then conclude that \(f(A)=B_n=O\).
When \(l=1\), obvious. Assume that the result is true for \(l-1\), i.e. the first \(l-1\) column of \(B_{l-1}=\prod_{k=1}^{l-1}(A-a_{kk}I)\) are all \(0\). Then \(\forall 1\le i\le n\) and \(\forall 1\le j\le l\), we have
Note that \(\forall 1\le k\le l-1, B_{l-1}(i,k)=0\) (induction hypothesis), and that \(\forall l\le k\le n, (A-a_{ll}I)(k,j)=0\), both \((1)\) and \((2)\) are zero, and so \(B_{l}(i,j)=0\). Therefore, the first \(l\) column of \(B_l\) are all \(0\). \(\blacksquare\)
Thanks to Schur's theorem, we can prove the following lemma without using the Jordan canonical form.
Lemma. Let \(A\in M_{n\times n}(\mathbb{C})\) and \(\text{Spec}(A)=\{\lambda_1,\cdots,\lambda_n\}\) (multiset). Then for any polynomial \(f\) over \(\mathbb{C}\), we have \(\text{Spec}(f(A))=\{f(\lambda_1),\cdots,f(\lambda_n)\}\). \(\blacksquare\)
Next proposition serves as a preparation for Corollary 3: Jordan Canonical Form. It is our removal rule.
Proposition. Let \(F\) be any subfield of \(\mathbb{C}\). Let \(A\in M_{m\times m}(F),B\in M_{n\times n}(F)\) be two square matrices. Let \(p_A,p_B\) be the characteristic polynomials of \(A,B\). If \(\text{gcd}(p_A,p_B)=1\) over \(F\), then for any \(M\in M_{m\times n}(F)\), the matrix \(\begin{pmatrix}A & M\\O & B\end{pmatrix}\) is similar to \(\begin{pmatrix}A & O\\O & B\end{pmatrix}\) as matrices in \(M_{(m+n)\times (m+n)}(F)\). (Note that if \(F=\mathbb{C}\), then the condition is equivalent to \(\text{Spec}(A)\cap \text{Spec}(B)=\varnothing\).)
Proof. If the Sylvester equation \(AX-XB=M\) has a solution, then
and thus the two matrices are similar. (In fact, the converse is also true, but much more difficult. It's called Roth's removal rule. The proof can be find in Parasolov's book.) Consider the linear operator
We need to show that \(\varphi\) is surjective. It suffices to show that \(\varphi\) is injective, i.e., if \(AX=XB\), then \(X=O\). Note that \(A^2X=A(AX)=A(XB)=(AX)B=(XB)B=XB^2\), and \(A^3X=A(A^2X)=A(XB^2)=(AX)B^2=(XB)B^2=XB^3\), etc. Thus, for any polynomial \(f\) over \(F\), we have \(f(A)X=Xf(B)\). Let \(m_A,m_B\) to be the minimal polynomials of \(A,B\) over \(F\). Then \(\text{gcd}(m_A,m_B)=1\) and \(m_B(A)X=Xm_B(B)=O\). We show that \(m_B(A)\) is invertible and therefore \(X=O\). Assume for the contrary that \(0\) is an eigenvalue of \(m_B(A)\). Since that the minimal polynomial of \(A\) over \(\mathbb{C}\) equals \(m_{A}\), by the lemma above there exists \(\lambda\in \mathbb{C}\) such that \(m_A(\lambda)=0\) and \(m_B(\lambda)=0\), and clearly \(\lambda\notin F\). Let \(h\) be the minimal polynomial of \(\lambda\) over \(F\), then \(h|m_A\) and \(h|m_B\), contradicting \(\text{gcd}(m_A,m_B)=1\). \(\blacksquare\)
Remark 1 (Minimal Polynomial). If \(E/F\) is a field extension and \(A\in M_{n\times n}(F)\), then the minimal polynomial of \(A\) over \(E\) equals the minimal polynomial of \(A\) over \(F\).
Remark 2 (Alternative Proof). When \(F=\mathbb{C}\), there is an alternative proof without invoking Cayley-Hamilton. (Note that the existence of minimal polynomials of square matrices is garanteed by Cayley-Hamilton.) By Schur's theorem, there exist two unitary matrix \(U_1,U_2\) such that \(T_1=U_1AU_1^*,T_2=U_2BU_2^*\) are upper triangular, with the eigenvalues of \(A,B\) on their diagonals. Define \(U=\begin{pmatrix}U_1&O\\O&U_2\end{pmatrix}\). Then \(U\) is unitary and
Therefore, we may assume without loss of generality that \(A=(a_{ij})_{m\times m},B=(b_{ij})_{n\times n}\) are upper triangular without commun diagonal entries. As showed in the previous proof, it suffices to show that if \(AX=XB\), then \(X=(x_{ij})_{m\times n}\) is zero. Indeed,
Hence we are done. \(\blacksquare\)
Corollary 3: Jordan Canonical Form
Schur triangularization implies Jordan canonical form over \(\mathbb{C}\).
Proof. Given any complex square matrix \(A\), by Schur's theorem \(A\) is unitarily equivalent to an upper triangular matrix of the form
where \(\lambda_1,\lambda_2,\cdots,\lambda_s\) are all the distinct eigenvalues of \(A\). By applying removal rule inductively, we derive that the matrix above is similar to the block diagonal matrix
This result implies that the space is the direct sum of the generalized eigenspaces of the operator.
Therefore it suffices to show that each \(\small\begin{pmatrix}\lambda_i&&\hspace{-1em}\large{*}\\ &\ddots&\\ &&\lambda_i\end{pmatrix}\) has a Jordan canonical form. We only need to show that if \(\Lambda\in M_{n\times n}(\mathbb{C})\) is strictly upper triangular, then it has a Jordan canonical form. Denote \(L_{\Lambda}:\mathbb{C}^n\to \mathbb{C}^n\) by \(T\). Clearly, \(T\) is nilpotent. Denote by \(k\) the nilpotency index of \(T\). Since \(k=1\) implies \(T=O\), we may assume that \(k\ge 2\). Then
Let \(\gamma_j\) be any basis for \(\ker(T^j)\ (j=1,\cdots,k-1)\). We construct a Jordan canonical basis for \(T\):
Step 1 Extend \(\gamma_{k-1}\) to a basis for \(\ker(T^k)\): \(\gamma_{k-1}\cup \beta_k\). (By the definition of \(k\), \(\beta_k\) is not empty.) Then \(\gamma_{k-2}\cup T^1\beta_{k}\) is linearly independent. Indeed, let \(\gamma_{k-1}=\{w_1,\cdots,w_p\},\gamma_{k-2}=\{w'_1,\cdots,w'_q\}\) and \(\beta_k=\{v_1,\cdots,v_m\}\), then
This argument also works in the following steps.
Step 2 Extend \(\gamma_{k-2}\cup T^1\beta_{k}\) to a basis for \(\ker(T^{k-1})\): \(\gamma_{k-2}\cup T^1\beta_k\cup \beta_{k-1}\). (It is possible that \(\beta_{k-1}=\varnothing\). Similarly hereinafter
.) Then \(\gamma_{k-3}\cup T^2\beta_k\cup T^1\beta_{k-1}\) is linearly independent.
Step 3 Extend \(\gamma_{k-3}\cup T^2\beta_k\cup T^1\beta_{k-1}\) to a basis for \(\ker(T^{k-2})\): \(\gamma_{k-3}\cup T^2\beta_k\cup T^1\beta_{k-1}\cup\beta_{k-2}\). Then \(\gamma_{k-4}\cup T^3\beta_k\cup T^2\beta_{k-1}\cup T^1\beta_{k-2}\) is linearly independent.
\(\cdots\)
Step k-1 Extend \(\gamma_1\cup T^{k-2}\beta_{k}\cup T^{k-3}\beta_{k-1}\cup\cdots\cup T^1\beta_3\) to a basis for \(\ker(T^2)\): \(\gamma_1\cup T^{k-2}\beta_{k}\cup T^{k-3}\beta_{k-1}\cup\cdots\cup T^1\beta_3\cup \beta_2\). Then \(T^{k-1}\beta_k\cup T^{k-2}\beta_{k-1}\cup\cdots\cup T^1\beta_2\) is linearly independent.
Step k Extend \(T^{k-1}\beta_k\cup T^{k-2}\beta_{k-1}\cup\cdots\cup T^1\beta_2\) to a basis for \(\ker(T^1)\): \(T^{k-1}\beta_k\cup T^{k-2}\beta_{k-1}\cup\cdots\cup T^1\beta_2\cup \beta_1\).
Since \(\gamma_1\) is an arbitrary basis for \(\ker(T^1)\), by substituting \(\gamma_1=T^{k-1}\beta_k\cup T^{k-2}\beta_{k-1}\cup\cdots\cup T^1\beta_2\cup \beta_1\) into Step k-1, we conclude that the union of
is a basis for \(\ker(T^2)\). Repeating this procedure inductively, we see that the union of
is a basis for \(\ker(T^k)=\mathbb{C}^n\). Moreover,
for all \(i\). (As a byproduct, we have
for \(1\le i\le k-1\). If the equality holds, then \(\#(\beta_i)=0\) and so \(\beta_i\) is empty. Another observation is that in the ascending chain
every "\(\subset\)" is strict.)
Let \(\beta_i=\{v_{i,1},\cdots,v_{i,n_i}\}\ (i=1,\cdots,k)\). Then
is an ordered basis for \(\mathbb{C}^n\) such that
where
for each \(i\). Therefore, \(\beta\) is a Jordan canonical basis for \(T\). This fulfills the proof. \(\blacksquare\)
Remark 1 (Computational Aspect and Further Observations). In general, if \(T\) is a linear operator on a nonzero
finite-dimensional complex vector space \(V\), then for any eigenvalue \(\lambda\) of \(T\), we have
The Jordan canonical form of \(T\) is then uniquely determined by these data up to a permutation of the Jordan blocks. Using the procedure in the proof above, we can construct a Jordan canonical basis \(\beta_{\lambda}\) for the restriction of \(T-\lambda I\) on the generalized eigenspace \(K_{\lambda}:=\bigcup_{i\ge 1}\ker(T-\lambda I)^i=\ker(T-\lambda I)^{m_{\lambda}}\), where \(m_{\lambda}\) is the algebraic multiplicity of \(\lambda\), i.e., the characteristic polynomial of \(T\) is
Recalling that \(V=\bigoplus_{\lambda} K_{\lambda}\), we conclude that \(\beta=\bigcup_{\lambda}\beta_{\lambda}\) is a Jordan canonical basis for \(T\). Moreover, the minimal polynomial of \(T\) is
where \(k_{\lambda}\) is the nilpotency index of \((T-\lambda I)_{K_{\lambda}}\), or equivaluently, the largest order of the Jordan blocks \(J_{\bullet}(\lambda)\) appeared in the Jordan canonical form of \(T\). This observation allows us to see that: Over the complex number field, an operator is diagonalizable iff its minimal polynomial has no repearted roots, and its minimal polynomial coincides with its characteristic polynomial iff its generalized eigenspaces are all cyclic subspaces.
Remark 2 (Jordan Canonical Form over \(\mathbb{R}\)). Let \(V\) be a nonzero finitely-dimensional real vector space and \(T\) a linear operator on \(V\). If \(\lambda\) is a nonreal eigenvalue of \(T_{\mathbb{C}}\), then so is \(\overline{\lambda}\), and \(v\mapsto \overline{v}\) induces a one-to-one correspondance between the Jordan blocks of \(T_{\mathbb{C}}\) corresponding to \(\lambda\) and \(\overline{\lambda}\). Suppose that the Jordan canonical form of \(T_{\mathbb{C}}\) is
where \(\{t_1,\cdots,t_m\}=\text{Spec}(T_{\mathbb{C}})\cap \mathbb{R}, \{\lambda_1,\overline{\lambda}_1;\cdots;\lambda_n,\overline{\lambda}_n\}=\text{Spec}(T_{\mathbb{C}})\setminus \mathbb{R}\), and \(\beta=\{u_1,\cdots,u_m;v_1,\overline{v}_1;\cdots;v_n,\overline{v}_{n}\}\) is the corresponding Jordan canonical basis. Replacing each entry \(z=x+iy\ (x,y\in \mathbb{R})\) in \(J_{c_j}(\lambda_j)\) by the matrix \(\small\begin{pmatrix} x & y \\ -y & x \end{pmatrix}\), we obtain a real matrix \(J_{c_j}(\lambda_j)_{\mathbb{R}}\). Define \(\gamma:=\{u_1,\cdots,u_m;\Re v_1,\Im v_1;\cdots;\Re v_n,\Im v_n\}\). Then \(\gamma\) is a basis of \(V\), and
Simultaneous Triangularization
Basic Observations
Theorem. Let \(A,B\in M_{n\times n}(\mathbb{C})\). Show that if \(\text{rank}(AB-BA)\le 1\), then \(A,B\) are simultaneously triangularizable.
Problem 2. Let \(A,B\) and \(C\) be matrices in \(M_n(\mathbb{C})\) such that \(C=AB-BA, AC=CA,BC=CB\).
(1) Show that the eigenvalues of \(C\) are all zero.
(2) Let \(m_A(\lambda)\) and \(m_B(\lambda)\) be the minimal polynomials of \(A\) and \(B\), respectively, and \(k:=\min \{\deg m_A(\lambda), \deg m_B(\lambda), n-1\}\). Show that \(C^k=0\).
(3) If \(n=2\), then \(C=O\).
(4) Show that there exists a commun eigenvector of \(A,B\) and \(C\).
(5) Show that \(A,B\) and \(C\) are simultaneously triangularizable.
TODO:
General Theory
THEOREM (Drazin-Dungey-Greunberg, 1951). Matrices \(A_1,\cdots,A_m\) are simultaneously triangularizable iff the matrix \(p(A,\cdots,A_m)[A_i,A_j]\) is nilpotent for every polynomial \(p(x_1,\cdots,x_m)\) in noncommuting indeterminates.
Applications
Disgressions
Theorem (Smiley, 1961). Suppose the matrices \(A\) and \(B\) are such that for a certain integer \(s>0\) the identity \(\text{ad}_A^sX=0\) implies \(\text{ad}_X^sB=0\). Then \(B\) can be expressed as a polynomial of \(A\).
Theorem (Fregus, 1966), If \(\text{tr}(A)=0\), then there exist matrices \(X\) and \(Y\) such that \(X\) is Hermitian, \(\text{tr}(Y)=0\), and \(A=[X,Y]\).
Theorem (Gibson, 1975). Let \(A\neq \lambda I\). Then \(A\) is similar to a matrix with the diagonal \((0,\cdots,0,\text{tr}(A))\).
Theorem (Gibson, 1975). Let \(\text{tr}(A)=0\) and \(\lambda_1,\cdots,\lambda_n,\mu_1,\cdots,\mu_n\) be given complex numbers such that \(\lambda_i\neq \lambda_j\) for \(i\neq j\). Then there exist complex matrices \(X\) and \(Y\) with eigenvalues \(\lambda_1,\cdots,\lambda_n\) and \(\mu_1,\cdots,\mu_n\), respectively, such that \(A=[X,Y]\).
Triangularization over PID
TODO: