Hw1_for_compressed_sensing
First homework for compressed sensing
Now I've only read the first part of the problem, which want me to verify some things about \(\ell^p\) norm.
Update
1.1 For a function \(\| \cdot \|\) to be a norm, it needs to satisfy triangle inequality: \(\forall x, y \in \mathbb{R}^n, \|x + y\| \leq \|x \| + \| y \|\).
take \(x_1 = 2, x_i = 0\) for all other i, and \(y_2 = 2\) and \(y_i = 0\) for all other i. Then
which contrdicts the property of a norm.
Reference for question1: https://statisticaloddsandends.wordpress.com/2020/05/27/lp-norm-is-not-a-norm-when-p-1/
1.2 For \(1\le p<\infty\), we have
So
Similarly, we have
and
Set \(\alpha\in[0,1]\) and \(p>1\), so \(\alpha^p<\alpha\) and \((1-\alpha)^p<1-\alpha\).
Hence we get \(\textit{inequality-1}\):
For \(p=\infty\),
which verifies \(\textit{inequality-2}\):
According to the above two inequalities, we have verified the convexity of the norm.
1.3 Since that when p<1,$\left | \cdot \right | $ is not a norm function, so we just consider the circumstances that \(p>1\). Reference for question3: https://kamindo.files.wordpress.com/2009/08/1-paper_04-12-2013.pdf
Intuitively if we assume \(\|x\|_p<\|x\|_q\) when \(1\le p<q\), according to the definition of norm, we have
Take the logarithm function on both sides,
where \(C\) is a constant. Since \(x\) is an arbitrary variable/vetor, the original inequality cannot holds.
The following is a rigorous proof: Let \(1\le p\le q\le\infty.\) For every \(x\in\ell^p\), we have
\(\begin{aligned}
\|x\|_{q}^{q} & =\sum_{k}\left|x_{k}\right|^{q} \\
& =\sum_{k}\left|x_{k}\right|^{q-p}\left|x_{k}\right|^{p} \\
& \leq \sup _{k}\left|x_{k}\right|^{q-p} \sum_{k}\left|x_{k}\right|^{p} \\
& \leq\left[\sum_{k}\left|x_{k}\right|^{p}\right]^{\frac{q-p}{p}} \sum_{k}\left|x_{k}\right|^{p} \\
& =\left[\sum_{k}\left|x_{k}\right|^{p}\right]^{\frac{q}{p}} .
\end{aligned}\)
Taking the q-th roots of both sides, we get \(\|x\|_{q} \leq\|x\|_{p}.\)
P2:
\(\operatorname{spark}(\mathbf{A})=\min _{\mathbf{d} \neq \mathbf{0}, \mathbf{A} \mathbf{d}=\mathbf{0}}\|\mathbf{d}\|_{0}\)
Since krank(A) is the maximum value of k such that any k columns of a matrix A are linearly independent, I expressed it in the mathematical way:
$krank(A)=\max_{\forall \mathbf{d},\mathbf{Ad\ne0}}\left ( k |k=\left | d \right |_0 \right ) $
Compare the above two formulas, we can find that \(1-\mathit{krank} (\mathbf{A} )=\mathit{spark} (\mathbf{A} )\).
P3:
P4:
- \(f(x)=\|x\|_{1}=\left|x_{1}\right|+\cdots+\left|x_{n}\right|\)
Now we talk about the subgradient of \(\textit{absolute value}\). For \(x_i<0\), the subgradient is unique \(\partial f(x_i)={-1}.\) Similarly, for \(x_i>0\) we have \(\partial f(x_i)={1}.\) At \(x_i=0,\) the subdifferential is defined by the inequality \(|x|>gx\) for all x, which can be satisfied when \(g \in[-1,1].\) So for each element in x, we have the overall subgradient.
P5: I've been stucked in the iteration steps.
Reference: http://faculty.bicmr.pku.edu.cn/~wenzw/optbook/pages/lasso_subgrad/l1_subgrad.html
hh😂 I've learned how to express \(\ell\) in latex