My Last Memento of Linear Algebra - Starting with Triangularization

ABSTRACT. Schur triangularization is a powerful tool in linear algebra: it implies spectral decompostion, Cayley-Hamilton, removal rule and Jordan canonical form over the complex number field. With the help of algebraic closure, ordinary triangularization serves to generalize the previous results. The spirit of simultaneous triangularization is explored through problem-solving, and the relevant theorems from Lie algebra are recalled. Finally, we consider the triangularization of matrices over PID. The main references are Linear Algebra (Fourth Edition) by Stephen H. Friedberg, Arnold J. Insel and Lawrence E. Spence, which is the textbook chosen by my professor when I was studying linear algebra, and Problems and Theorems in Linear Algebra by Viktor V. Prasolov, which contains a lot of fascinating results and is available online: [http://staff.math.su.se/mleites/books/prasolov-1994-problems.pdf].

Corollaries of Schur Triangularization and Ordinary Triangularization
Simultaneous Triangularization
Triangularization over PID

Corollaries of Schur Triangularization and Ordinary Triangularization

TO BEGIN WITH... The Generalized Schur Triangularization is in this blog: [https://www.cnblogs.com/chaliceseven/p/17094280.html]. The Ordinary Triangularization is in this blog : [https://www.cnblogs.com/chaliceseven/p/17094288.html].

Corollary 1: The Spectral Theorem

From the proof of I), we see that for any linear operator on a nonzero finitely-dimensional complex inner product space is unitarily triangonalizable. This is Schur's theorem. Using the fact in III), it follows that a linear operator on a nonzero finitely-dimensional complex inner product space is unitarily diagonalizable iff it is normal. Moreover, if a linear operator on a finitely-dimensional real inner product space is self-adjoint (aka Hermitian),then it is normal and hence has the matrix representation

\[\small \begin{pmatrix} \begin{matrix}a_1 & -b_1 \\ b_1 & a_1\end{matrix} & & & & &\\ & \ddots& & & \Large{*} &\\ & & \begin{matrix}a_q & -b_q \\ b_q & a_q\end{matrix} & & &\\ & & & c_1 & &\\ & & & & \ddots &\\ & & & & & c_p \end{pmatrix} \]

with respect to some orthonormal basis. But the matrix is self-adjoint, and thus it is actually diagonal. Thus we obtain the spectral theorem: Under the finite-dimensional assumption, any real self-adjoint operator (resp. complex normal operator) is a linear combination of real (resp. complex) orthogonal projections. $\blacksquare$

Remark 1 (Schur's Inequality). Let $A\in M_{n\times n}(\mathbb{C})$ and $\lambda_i\ (i=1,\cdots,n)$ be the eigenvalues of $A$. Denote by $\|\cdot\|_F$ the Frobenius norm. By Schur's theorem, we have

\[\sum_{i=1}^{n}|\lambda_i|^2\le \|A\|_F^2 \]

with the equality holds iff $A$ is normal. This is Schur's inequality. In fact, we can derive the following equality:

\[\inf\limits_{X\in GL_n(\mathbb{C})}\|X^{-1}AX\|_F^2=\sum_{i=1}^{n}|\lambda_i|^2 \]

Therefore, every normal operator has the minimal Frobenius norm in its similarity class. (Note that two similar normal operators are automatically unitarily equivalent and hence have the same Frobenius norm.) Conversely, if a matrix minize the Frobenius norm in its similarity class, then it must be normal.

Remark 2 (Disgression: Low-Rank Approximation). Let $A\in M_{m\times n}(\mathbb{C})$ and $\sigma_1\ge \cdots \ge \sigma_k\ge \cdots\ge \sigma_r\ge 0$ be the nonzero eigenvalues of $A$, where $r=\text{rank}(A)$ and $1\le k\le r$. Then we have

\[\inf\limits_{\text{rank}(B)\le k}\|B-A\|_{F}^2=\sum_{i=k+1}^{r}|\sigma_i|^2 \]

(Also, note that $\|A\|_F^2=\sum_{i=1}^{r}|\sigma_i|^2$.) If $A=U\Sigma V^*$ is a SVD such that

\[\Sigma=\small\begin{pmatrix} \left.\begin{matrix} \sigma_1 & & & & \\ & \ddots & & & \\ & & \sigma_k & &\\ & & & \ddots & \\ & & & & \sigma_r \\ \hline \end{matrix}\hspace{-0.2em}\right| & & & & \\ & & & & \\ & & & & \end{pmatrix}\]

then $\widehat{A}=U\widehat{\Sigma}V^*$ achieves the infimum, where

\[\widehat{\Sigma}=\begin{pmatrix} \left.\small\begin{matrix} \sigma_1 & & \\ & \ddots & \\ & & \sigma_k \\\hline \end{matrix}\hspace{-0.2em}\right| & & & & & & \\ & & & & & & \\ & & & & & & \\ & & & & & & \end{pmatrix} \]

Moreover, if $\sigma_k\neq \sigma_{k+1}$, then the minimizer is unique. This is Eckart–Young–Mirsky theorem for Frobenius norm. In fact, $\widehat{A}$ is also the best rank-$k$ approximation to $A$ in the spectral norm, and

\[\inf\limits_{\text{rank}(B)\le k}\|B-A\|_2=\|\widehat{A}-A\|_2=\sigma_{k+1} \]

The proof can be find on Wikipedia.

Corollary 2: Cayley-Hamilton and Removal Rule

Proof of Cayley-Hamilton. By Schur's theorem, it suffices prove Cayley-Hamilton for any complex upper triangular matrix $A=(a_{ij})_{n\times n}$.

Note that the characteristic polynomial of $A$ is $f(t)=\prod_{k=1}^{n}(t-a_{kk})$ and hence $f(A)=\prod_{k=1}^{n}(A-a_{kk}I)$. We prove by induction that the first $l$ column of the matrix $B_{l}=\prod_{k=1}^{l}(A-a_{kk}I)$ are all $0$, for all $1\le l\le n$, and then conclude that $f(A)=B_n=O$.

When $l=1$, obvious. Assume that the result is true for $l-1$, i.e. the first $l-1$ column of $B_{l-1}=\prod_{k=1}^{l-1}(A-a_{kk}I)$ are all $0$. Then $\forall 1\le i\le n$ and $\forall 1\le j\le l$, we have

\[\begin{align*} B_l(i,j)&=\sum_{k=1}^{n}B_{l-1}(i,k)(A-a_{ll}I)(k,j)\\&=\underbrace{\sum_{k=1}^{l-1}B_{l-1}(i,k)(A-a_{ll}I)(k,j)}_{(1)}+\underbrace{\sum_{k=l}^{n}B_{l-1}(i,k)(A-a_{ll}I)(k,j)}_{(2)}. \end{align*} \]

Note that $\forall 1\le k\le l-1, B_{l-1}(i,k)=0$ (induction hypothesis), and that $\forall l\le k\le n, (A-a_{ll}I)(k,j)=0$, both $(1)$ and $(2)$ are zero, and so $B_{l}(i,j)=0$. Therefore, the first $l$ column of $B_l$ are all $0$. $\blacksquare$

Thanks to Schur's theorem, we can prove the following lemma without using the Jordan canonical form.

Lemma. Let $A\in M_{n\times n}(\mathbb{C})$ and $\text{Spec}(A)=\{\lambda_1,\cdots,\lambda_n\}$ (multiset). Then for any polynomial $f$ over $\mathbb{C}$, we have $\text{Spec}(f(A))=\{f(\lambda_1),\cdots,f(\lambda_n)\}$. $\blacksquare$

Next proposition serves as a preparation for Corollary 3: Jordan Canonical Form. It is our removal rule.

Proposition. Let $F$ be any subfield of $\mathbb{C}$. Let $A\in M_{m\times m}(F),B\in M_{n\times n}(F)$ be two square matrices. Let $p_A,p_B$ be the characteristic polynomials of $A,B$. If $\text{gcd}(p_A,p_B)=1$ over $F$, then for any $M\in M_{m\times n}(F)$, the matrix $\begin{pmatrix}A & M\\O & B\end{pmatrix}$ is similar to $\begin{pmatrix}A & O\\O & B\end{pmatrix}$ as matrices in $M_{(m+n)\times (m+n)}(F)$. (Note that if $F=\mathbb{C}$, then the condition is equivalent to $\text{Spec}(A)\cap \text{Spec}(B)=\varnothing$.)

Proof. If the Sylvester equation $AX-XB=M$ has a solution, then

\[\begin{pmatrix}I_m & X\\O & I_n\end{pmatrix}\begin{pmatrix}A & M\\O & B\end{pmatrix}\begin{pmatrix}I_m & -X\\O & I_n\end{pmatrix}=\begin{pmatrix}A & O\\O & B\end{pmatrix} \]

and thus the two matrices are similar. (In fact, the converse is also true, but much more difficult. It's called Roth's removal rule. The proof can be find in Parasolov's book.) Consider the linear operator

\[\varphi:M_{m\times n}(F)\to M_{m\times n}(F)\quad X\mapsto AX-XB \]

We need to show that $\varphi$ is surjective. It suffices to show that $\varphi$ is injective, i.e., if $AX=XB$, then $X=O$. Note that $A^2X=A(AX)=A(XB)=(AX)B=(XB)B=XB^2$, and $A^3X=A(A^2X)=A(XB^2)=(AX)B^2=(XB)B^2=XB^3$, etc. Thus, for any polynomial $f$ over $F$, we have $f(A)X=Xf(B)$. Let $m_A,m_B$ to be the minimal polynomials of $A,B$ over $F$. Then $\text{gcd}(m_A,m_B)=1$ and $m_B(A)X=Xm_B(B)=O$. We show that $m_B(A)$ is invertible and therefore $X=O$. Assume for the contrary that $0$ is an eigenvalue of $m_B(A)$. Since that the minimal polynomial of $A$ over $\mathbb{C}$ equals $m_{A}$, by the lemma above there exists $\lambda\in \mathbb{C}$ such that $m_A(\lambda)=0$ and $m_B(\lambda)=0$, and clearly $\lambda\notin F$. Let $h$ be the minimal polynomial of $\lambda$ over $F$, then $h|m_A$ and $h|m_B$, contradicting $\text{gcd}(m_A,m_B)=1$. $\blacksquare$

Remark 1 (Minimal Polynomial). If $E/F$ is a field extension and $A\in M_{n\times n}(F)$, then the minimal polynomial of $A$ over $E$ equals the minimal polynomial of $A$ over $F$.

Remark 2 (Alternative Proof). When $F=\mathbb{C}$, there is an alternative proof without invoking Cayley-Hamilton. (Note that the existence of minimal polynomials of square matrices is garanteed by Cayley-Hamilton.) By Schur's theorem, there exist two unitary matrix $U_1,U_2$ such that $T_1=U_1AU_1^*,T_2=U_2BU_2^*$ are upper triangular, with the eigenvalues of $A,B$ on their diagonals. Define $U=\begin{pmatrix}U_1&O\\O&U_2\end{pmatrix}$. Then $U$ is unitary and

\[U\begin{pmatrix}A & O\\O & B\end{pmatrix}U^*=\begin{pmatrix}T_1 & O\\O & T_2\end{pmatrix},\quad U\begin{pmatrix}A & M\\O & B\end{pmatrix}U^*=\begin{pmatrix}T_1 & U_1MU_2^*\\O & T_2\end{pmatrix} \]

Therefore, we may assume without loss of generality that $A=(a_{ij})_{m\times m},B=(b_{ij})_{n\times n}$ are upper triangular without commun diagonal entries. As showed in the previous proof, it suffices to show that if $AX=XB$, then $X=(x_{ij})_{m\times n}$ is zero. Indeed,

\[\begin{align*} & \text{Entries of } AX=XB : \text{Equations and Consequences} \\\hline \bullet\ & (m,1): a_{mm}x_{m1}=x_{m1}b_{11}\implies x_{m1}=0\\ \bullet\ & (m,2),(m-1,1): \begin{cases} a_{m,m}x_{m2}=x_{m2}b_{22}\implies x_{m2}=0\\ a_{m-1,m-1}x_{m-1,1}=x_{m-1,1}b_{11}\implies x_{m-1,1}=0 \end{cases}\\ \bullet\ & (m,3),(m-1,2),(m-2,1): \begin{cases} a_{m,m}x_{m3}=x_{m3}b_{33}\implies x_{m3}=0\\ a_{m-1,m-1}x_{m-1,2}=x_{m-1,2}b_{22}\implies x_{m-1,2}=0\\ a_{m-2,m-2}x_{m-2,1}=x_{m-2,1}b_{11}\implies x_{m-2,1}=0 \end{cases}\\ \end{align*} \]

\[\cdots \]

Hence we are done. $\blacksquare$

Corollary 3: Jordan Canonical Form

Schur triangularization implies Jordan canonical form over $\mathbb{C}$.

Proof. Given any complex square matrix $A$, by Schur's theorem $A$ is unitarily equivalent to an upper triangular matrix of the form

\[\begin{pmatrix} \boxed{\small\begin{matrix}\lambda_1&&\hspace{-1em}\large{*}\\ &\ddots&\\ &&\lambda_1\end{matrix}} & \Large{*} & \cdots & \Large{*} \\ & \boxed{\small\begin{matrix}\lambda_2&&\hspace{-1em}\large{*}\\ &\ddots&\\ &&\lambda_2\end{matrix}} & \cdots & \Large{*}\\ & & \ddots & \vdots\\ & & & \boxed{\small\begin{matrix}\lambda_l&&\hspace{-1em}\large{*}\\ &\ddots&\\ &&\lambda_l\end{matrix}} \end{pmatrix} \]

where $\lambda_1,\lambda_2,\cdots,\lambda_s$ are all the distinct eigenvalues of $A$. By applying removal rule inductively, we derive that the matrix above is similar to the block diagonal matrix

\[\begin{pmatrix} \boxed{\small\begin{matrix}\lambda_1&&\hspace{-1em}\large{*}\\ &\ddots&\\ &&\lambda_1\end{matrix}} & & & \\ & \hspace{-0.5em}\boxed{\small\begin{matrix}\lambda_2&&\hspace{-1em}\large{*}\\ &\ddots&\\ &&\lambda_2\end{matrix}} & &\\ & & \ddots &\\ & & & \boxed{\small\begin{matrix}\lambda_l&&\hspace{-1em}\large{*}\\ &\ddots&\\ &&\lambda_l\end{matrix}} \end{pmatrix} \]

This result implies that the space is the direct sum of the generalized eigenspaces of the operator.

Therefore it suffices to show that each $\small\begin{pmatrix}\lambda_i&&\hspace{-1em}\large{*}\\ &\ddots&\\ &&\lambda_i\end{pmatrix}$ has a Jordan canonical form. We only need to show that if $\Lambda\in M_{n\times n}(\mathbb{C})$ is strictly upper triangular, then it has a Jordan canonical form. Denote $L_{\Lambda}:\mathbb{C}^n\to \mathbb{C}^n$ by $T$. Clearly, $T$ is nilpotent. Denote by $k$ the nilpotency index of $T$. Since $k=1$ implies $T=O$, we may assume that $k\ge 2$. Then

\[0=\ker(T^0)\subset \ker(T^1)\subset \cdots\subset \ker(T^{k-1})\subset \ker(T^k)=\mathbb{C}^n \]

Let $\gamma_j$ be any basis for $\ker(T^j)\ (j=1,\cdots,k-1)$. We construct a Jordan canonical basis for $T$:

Step 1 Extend $\gamma_{k-1}$ to a basis for $\ker(T^k)$: $\gamma_{k-1}\cup \beta_k$. (By the definition of $k$, $\beta_k$ is not empty.) Then $\gamma_{k-2}\cup T^1\beta_{k}$ is linearly independent. Indeed, let $\gamma_{k-1}=\{w_1,\cdots,w_p\},\gamma_{k-2}=\{w'_1,\cdots,w'_q\}$ and $\beta_k=\{v_1,\cdots,v_m\}$, then

\[\begin{align*} \sum_{i=1}^{m}\widetilde{k}_iT(v_i)+\sum_{i=1}^{q}k'_iw'_i=0 &\implies T^{k-1}(\sum_{i=1}^{m}\widetilde{k}_iv_i)=0\\ &\implies \sum_{i=1}^{m}\widetilde{k}_iv_i=\sum_{i=1}^{p}k_iw_i\text{ for some $(k_1,\cdots,k_p)$}\\ &\implies \text{$\widetilde{k}_i=0$ for all $i$, and further $k'_i=0$ for all $i$} \end{align*} \]

This argument also works in the following steps.

Step 2 Extend $\gamma_{k-2}\cup T^1\beta_{k}$ to a basis for $\ker(T^{k-1})$: $\gamma_{k-2}\cup T^1\beta_k\cup \beta_{k-1}$. (It is possible that $\beta_{k-1}=\varnothing$. Similarly hereinafter
.) Then $\gamma_{k-3}\cup T^2\beta_k\cup T^1\beta_{k-1}$ is linearly independent.

Step 3 Extend $\gamma_{k-3}\cup T^2\beta_k\cup T^1\beta_{k-1}$ to a basis for $\ker(T^{k-2})$: $\gamma_{k-3}\cup T^2\beta_k\cup T^1\beta_{k-1}\cup\beta_{k-2}$. Then $\gamma_{k-4}\cup T^3\beta_k\cup T^2\beta_{k-1}\cup T^1\beta_{k-2}$ is linearly independent.

$\cdots$

Step k-1 Extend $\gamma_1\cup T^{k-2}\beta_{k}\cup T^{k-3}\beta_{k-1}\cup\cdots\cup T^1\beta_3$ to a basis for $\ker(T^2)$: $\gamma_1\cup T^{k-2}\beta_{k}\cup T^{k-3}\beta_{k-1}\cup\cdots\cup T^1\beta_3\cup \beta_2$. Then $T^{k-1}\beta_k\cup T^{k-2}\beta_{k-1}\cup\cdots\cup T^1\beta_2$ is linearly independent.

Step k Extend $T^{k-1}\beta_k\cup T^{k-2}\beta_{k-1}\cup\cdots\cup T^1\beta_2$ to a basis for $\ker(T^1)$: $T^{k-1}\beta_k\cup T^{k-2}\beta_{k-1}\cup\cdots\cup T^1\beta_2\cup \beta_1$.

Since $\gamma_1$ is an arbitrary basis for $\ker(T^1)$, by substituting $\gamma_1=T^{k-1}\beta_k\cup T^{k-2}\beta_{k-1}\cup\cdots\cup T^1\beta_2\cup \beta_1$ into Step k-1, we conclude that the union of

\[\begin{matrix} T^{k-2}\beta_{k} & T^{k-3}\beta_{k-1} & \cdots & \beta_2 & \\ T^{k-1}\beta_k & T^{k-2}\beta_{k-1} & \cdots & T^1\beta_2 & \beta_1 \end{matrix} \]

is a basis for $\ker(T^2)$. Repeating this procedure inductively, we see that the union of

\[\begin{matrix} \beta_k & & & & \\ T\beta_k & \beta_{k-1} & & & \\ \vdots & \vdots & \ddots & & \\ T^{k-2}\beta_{k} & T^{k-3}\beta_{k-1} & \cdots & \beta_2 & \\ T^{k-1}\beta_k & T^{k-2}\beta_{k-1} & \cdots & T^1\beta_2 & \beta_1 \end{matrix} \]

is a basis for $\ker(T^k)=\mathbb{C}^n$. Moreover,

\[\begin{align*} \#(\beta_i)&=[\dim\ker(T^i)-\dim\ker(T^{i-1})]-[\dim\ker(T^{i+1})-\dim\ker(T^i)]\\ &=2\dim\ker(T^i)-\dim\ker(T^{i+1})-\dim\ker(T^{i-1}) \end{align*} \]

for all $i$. (As a byproduct, we have

\[\dim \ker(T^i)\ge \frac{\dim \ker(T^{i+1})+\dim \ker(T^{i-1})}{2} \]

for $1\le i\le k-1$. If the equality holds, then $\#(\beta_i)=0$ and so $\beta_i$ is empty. Another observation is that in the ascending chain

\[0=\ker(T^0)\subset \ker(T^1)\subset \cdots\subset \ker(T^{k-1})\subset \ker(T^k)=\mathbb{C}^n \]

every "$\subset$" is strict.)

Let $\beta_i=\{v_{i,1},\cdots,v_{i,n_i}\}\ (i=1,\cdots,k)$. Then

\[\beta:=\bigcup_{i=1}^{k}\bigcup_{j=1}^{n_i}\{T^{i-1}(v_{i,j}),\cdots,T(v_{i,j}),v_{i,j}\} \]

is an ordered basis for $\mathbb{C}^n$ such that

\[[T]_{\beta}=\text{diag}(\underbrace{J_1(0),\cdots,J_1(0)}_{\#(\beta_1)};\underbrace{J_2(0),\cdots,J_2(0)}_{\#(\beta_2)};\cdots;\underbrace{J_k(0),\cdots,J_k(0)}_{\#(\beta_k)}) \]

where

\[J_i(0):=\begin{pmatrix} 0 & 1 & & & \\ & 0 & 1 & & \\ & & \ddots & \ddots & \\ & & & 0 & 1 \\ & & & & 0 \end{pmatrix}_{i\times i} \]

for each $i$. Therefore, $\beta$ is a Jordan canonical basis for $T$. This fulfills the proof. $\blacksquare$

Remark 1 (Computational Aspect and Further Observations). In general, if $T$ is a linear operator on a nonzero
finite-dimensional complex vector space $V$, then for any eigenvalue $\lambda$ of $T$, we have

\[\#(J_{i}(\lambda))=2\dim \ker(T-\lambda I)^i-\dim \ker(T-\lambda I)^{i+1}-\dim \ker(T-\lambda I)^{i-1} \]

The Jordan canonical form of $T$ is then uniquely determined by these data up to a permutation of the Jordan blocks. Using the procedure in the proof above, we can construct a Jordan canonical basis $\beta_{\lambda}$ for the restriction of $T-\lambda I$ on the generalized eigenspace $K_{\lambda}:=\bigcup_{i\ge 1}\ker(T-\lambda I)^i=\ker(T-\lambda I)^{m_{\lambda}}$, where $m_{\lambda}$ is the algebraic multiplicity of $\lambda$, i.e., the characteristic polynomial of $T$ is

\[p(x)=\prod_{\lambda}(x-\lambda)^{m_{\lambda}} \]

Recalling that $V=\bigoplus_{\lambda} K_{\lambda}$, we conclude that $\beta=\bigcup_{\lambda}\beta_{\lambda}$ is a Jordan canonical basis for $T$. Moreover, the minimal polynomial of $T$ is

\[m(x)=\prod_{\lambda} (x-\lambda)^{k_{\lambda}} \]

where $k_{\lambda}$ is the nilpotency index of $(T-\lambda I)_{K_{\lambda}}$, or equivaluently, the largest order of the Jordan blocks $J_{\bullet}(\lambda)$ appeared in the Jordan canonical form of $T$. This observation allows us to see that: Over the complex number field, an operator is diagonalizable iff its minimal polynomial has no repearted roots, and its minimal polynomial coincides with its characteristic polynomial iff its generalized eigenspaces are all cyclic subspaces.

Remark 2 (Jordan Canonical Form over $\mathbb{R}$). Let $V$ be a nonzero finitely-dimensional real vector space and $T$ a linear operator on $V$. If $\lambda$ is a nonreal eigenvalue of $T_{\mathbb{C}}$, then so is $\overline{\lambda}$, and $v\mapsto \overline{v}$ induces a one-to-one correspondance between the Jordan blocks of $T_{\mathbb{C}}$ corresponding to $\lambda$ and $\overline{\lambda}$. Suppose that the Jordan canonical form of $T_{\mathbb{C}}$ is

\[[T_{\mathbb{C}}]_{\beta}=\begin{pmatrix} J_{r_1}(t_1) & & & & & \\ & \ddots & & & & \\ & & J_{r_m}(t_m) & & & \\ & & & \boxed{\begin{matrix}J_{c_1}(\lambda_1) & \\ & J_{c_1}(\overline{\lambda}_1)\end{matrix}} & & \\ & & & & & \\ & & & & \ddots & \\ & & & & & \boxed{\begin{matrix} J_{c_n}(\lambda_n) & \\ & J_{c_n}(\overline{\lambda}_n) \end{matrix}} \end{pmatrix} \]

where $\{t_1,\cdots,t_m\}=\text{Spec}(T_{\mathbb{C}})\cap \mathbb{R}, \{\lambda_1,\overline{\lambda}_1;\cdots;\lambda_n,\overline{\lambda}_n\}=\text{Spec}(T_{\mathbb{C}})\setminus \mathbb{R}$, and $\beta=\{u_1,\cdots,u_m;v_1,\overline{v}_1;\cdots;v_n,\overline{v}_{n}\}$ is the corresponding Jordan canonical basis. Replacing each entry $z=x+iy\ (x,y\in \mathbb{R})$ in $J_{c_j}(\lambda_j)$ by the matrix $\small\begin{pmatrix} x & y \\ -y & x \end{pmatrix}$, we obtain a real matrix $J_{c_j}(\lambda_j)_{\mathbb{R}}$. Define $\gamma:=\{u_1,\cdots,u_m;\Re v_1,\Im v_1;\cdots;\Re v_n,\Im v_n\}$. Then $\gamma$ is a basis of $V$, and

\[[T]_{\gamma}=\begin{pmatrix} J_{r_1}(t_1) & & & & & \\ & \ddots & & & & \\ & & J_{r_m}(t_m) & & & \\ & & & \boxed{\begin{matrix} && \\ &J_{c_1}(\lambda_1)_{\mathbb{R}}& \\ && \end{matrix}} & & \\ & & & & & \\ & & & & \ddots & \\ & & & & & \boxed{\begin{matrix} && \\ &J_{c_n}(\lambda_n)_{\mathbb{R}}& \\ && \end{matrix}} \end{pmatrix} \]

Simultaneous Triangularization

Basic Observations

Theorem. Let $A,B\in M_{n\times n}(\mathbb{C})$. Show that if $\text{rank}(AB-BA)\le 1$, then $A,B$ are simultaneously triangularizable.

Problem 2. Let $A,B$ and $C$ be matrices in $M_n(\mathbb{C})$ such that $C=AB-BA, AC=CA,BC=CB$.
(1) Show that the eigenvalues of $C$ are all zero.
(2) Let $m_A(\lambda)$ and $m_B(\lambda)$ be the minimal polynomials of $A$ and $B$, respectively, and $k:=\min \{\deg m_A(\lambda), \deg m_B(\lambda), n-1\}$. Show that $C^k=0$.
(3) If $n=2$, then $C=O$.
(4) Show that there exists a commun eigenvector of $A,B$ and $C$.
(5) Show that $A,B$ and $C$ are simultaneously triangularizable.

TODO:

General Theory

THEOREM (Drazin-Dungey-Greunberg, 1951). Matrices $A_1,\cdots,A_m$ are simultaneously triangularizable iff the matrix $p(A,\cdots,A_m)[A_i,A_j]$ is nilpotent for every polynomial $p(x_1,\cdots,x_m)$ in noncommuting indeterminates.

Applications

Disgressions

Theorem (Smiley, 1961). Suppose the matrices $A$ and $B$ are such that for a certain integer $s>0$ the identity $\text{ad}_A^sX=0$ implies $\text{ad}_X^sB=0$. Then $B$ can be expressed as a polynomial of $A$.

Theorem (Fregus, 1966), If $\text{tr}(A)=0$, then there exist matrices $X$ and $Y$ such that $X$ is Hermitian, $\text{tr}(Y)=0$, and $A=[X,Y]$.

Theorem (Gibson, 1975). Let $A\neq \lambda I$. Then $A$ is similar to a matrix with the diagonal $(0,\cdots,0,\text{tr}(A))$.

Theorem (Gibson, 1975). Let $\text{tr}(A)=0$ and $\lambda_1,\cdots,\lambda_n,\mu_1,\cdots,\mu_n$ be given complex numbers such that $\lambda_i\neq \lambda_j$ for $i\neq j$. Then there exist complex matrices $X$ and $Y$ with eigenvalues $\lambda_1,\cdots,\lambda_n$ and $\mu_1,\cdots,\mu_n$, respectively, such that $A=[X,Y]$.

Triangularization over PID