PaperReading——Robust Continuous Clustering(RCC)

论文阅读——基于鲁棒性聚类
Here I will summarize the RCC Algorithm for more details by myself.

\begin{algorithm}[H]
\caption{Construct Mutual-kNN for Large Dataset Algorithm}
\begin{algorithmic}[1]
\Require
X (array) 2d array of data of shape (nsamples, ndim)

      k (int) number of neighbors for each sample in X

      measure (string) distance metric, one of 'cosine' or 'euclidean'
    \Ensure
    A series neighbors weight for every sample in Dataset

    \State compute each sample $x_{i}$ distances $Dist(x_{i},x_{j})$ with all samples $\{x_{j},j\neq i\}$
    \State If j is one of i's closest neighbors and i is also one of j's closest members,
    the edge will appear once with (i,j) where $i<j$.
    \State Sort the distances from small to large ( measure=euclidean ) or large to small ( measure=cosine )
    \State Select the first 1\% distance samples
    \State Compute the mutual-knn graph

  \end{algorithmic}
\end{algorithm}

\begin{algorithm}[H]
\caption{Robust continuous clustering Algorithm}
\begin{algorithmic}[1]
\Require
X (array) 2d array of data of shape (nsamples, ndim).

      w (array) weights for each edge, as computed by the mutual knn clustering.

      max-iter (int) maximum number of iterations to run the algorithm.

      inner-iter (int) number of inner iterations. 4 works well in most cases.

    \Ensure
    X (array) numpy array of data to cluster with shape (nsamples, nfeatures)

    \State pre-computing $x_{i}$ use $L_{2}$ norm
    \State set the weights by $$w_{p,q}=\frac{\sum_{i=1}^{n}N_{i}}{n\sqrt{N_{p}N_{q}}},$$ where $N_{i}$ is the number of edges incident to $x_{i}$ in graph E.
    \State initializing the representatives U to have the same value as X
    \State initialize $l_{p,q}$ to 1, that is all connections are active($l_{p,q}$ is a penalty term on the connections)
    \State compute $\delta , \mu$,$\mu=3r^{2}$ where $r$ is the maximal edge length in E
    \State Take the top 1\% of the closest neighbours as a heuristic
    \State Computation of matrix A = D-R (here D is the diagonal matrix and R is the symmetric matrix)
        $$A=\sum_{(p,q)\in \epsilon}^{ }w_{p,q}l_{p,q}(e_{p}-e_{q})(e_{p}-e_{q})^{T}$$
    \State Calculate $\lambda=\frac{||X||_{2}}{||A||_{2}}$
    \State Start of optimization objective.
    \algstore{myalg}


  \end{algorithmic}
\end{algorithm}

\begin{algorithm}
\begin{algorithmic} [1] % enter the algorithmic environment
\algrestore{myalg}
\For i max-iter
\State Update \(l_{p,q}=(\frac{\mu}{\mu+||u_{p}-u_{q}||_{2}^{2}})\)
\State Compute objective \(arg \min \frac{1}{2}||X-U||_{F}^{2}+\frac{\lambda}{2}\sum_{(p,q)\in \epsilon}^{ }w_{p,q}l_{p,q}||U(e_{p}-e_{q})||_{2}^{2}\)
\State Update U. \(UM=X\),where \(M=I+\lambda \sum_{(p,q)\in \epsilon}^{ }w_{p,q}l_{p,q}(e_{p}-e_{q})(e_{p}-e_{q})^{T}\)
\State Solve for U.
\State Check for the termination conditions and modulate \(\delta\) if necessary.
\State The value of µ is halved every 4 iterations until it drops below \(\frac{\delta}{2}\).
\EndFor
\State At the end of the run, assign values to the class members.
\State Set a series label for Dataset which are clustering number.
\end{algorithmic}
\end{algorithm}

Notes:

\begin{itemize}
\item Laplacian matrix
\subitem In the mathematical field of graph theory, the Laplacian matrix, sometimes called admittance matrix, Kirchhoff matrix or discrete Laplacian, is a matrix representation of a graph. The Laplacian matrix can be used to find many useful properties of a graph. Together with Kirchhoff's theorem, it can be used to calculate the number of spanning trees for a given graph. The sparsest cut of a graph can be approximated through the second smallest eigenvalue of its Laplacian by Cheeger's inequality. It can also be used to construct low dimensional embeddings, which can be useful for a variety of machine learning applications.
\subitem Given a simple graph G with n vertices, its Laplacian matrix \(L_{n\times n}\) is defined as:
$$L=D-A,$$ where D is the degree matrix and A is the adjacency matrix of the graph.
$$
L_{i,j}:=\left{
\begin{array}{lr}
deg(v_{i}), & \mbox{if} i=j \
-1,& \mbox{if} i=j \mbox{and} v_{i} \mbox{is adjacent to} v_{j}\
0,&otherwise\
\end{array}
\right.
$$

For an (undirected) graph G and its Laplacian matrix L with eigenvalues $\lambda_{0~n-1}$

\subitem L is symmetric.
\subitem L is positive-semidefinite (that is ${\displaystyle \lambda _{i}\geq 0} \lambda_i \ge 0 for all {\displaystyle i} i)$. This is verified in the incidence matrix section (below). This can also be seen from the fact that the Laplacian is symmetric and diagonally dominant.

\subitem

\begin{figure}[H]
\centering
\caption{Here is a simple example of a labeled, undirected graph and its Laplacian matrix.}
\includegraphics[width=10cm]{LM.png}
\end{figure}

\item Absolute-value norm $$\parallel x \parallel = |x|$$
    \subitem The absolute value norm is a special case of the L1 norm.The absolute value is a norm on the one-dimensional vector spaces formed by the real or complex numbers.
\item Euclidean norm

\subitem the intuitive notion of length of the vector x = (x1, x2, ..., xn) is captured by the formula,called L2 norm:
    $${\displaystyle \left\|{\boldsymbol {X}}\right\|_{2}:={\sqrt {x_{1}^{2}+\cdots +x_{n}^{2}}}.} $$
\item Manhattan norm

$${\displaystyle \left\|{\boldsymbol {x}}\right\|_{1}:=\sum _{i=1}^{n}\left|x_{i}\right|.}$$
  \subitem Manhattan norm is also called the L1 norm. The name relates to the distance a taxi has to drive in a rectangular street grid to get from the origin to the point x.
\item p-norm

vectors ${\displaystyle \mathbf {x} =(x_{1},\ldots ,x_{n})}$ p-norm is

$${\displaystyle \left\|\mathbf {x} \right\|_{p}:={\bigg (}\sum _{i=1}^{n}\left|x_{i}\right|^{p}{\bigg )}^{1/p}.} $$

\item Maximum norm

$${\displaystyle \left\|\mathbf {x} \right\|_{\infty }:=\max \left(\left|x_{1}\right|,\ldots ,\left|x_{n}\right|\right).} $$

\end{itemize}

\subsection{Motivation}
Clustering is the basic experimental procedure for data analysis.
It is used in almost all natural and social sciences and plays a central role in biology, astronomy, psychology, medicine, and chemistry.
Despite the importance and ubiquity of clustering, existing algorithms suffer from various shortcomings and no universal solution emerges.
Despite these developments, no single algorithm has emerged to displace the k-means scheme and its variants.
This is despite the known drawbacks of such center based methods, including sensitivity to initialization, limited effectiveness in high dimensional spaces,
and the requirement that the number of clusters be set in advance.
The endurance of these methods is in part due to their simplicity and in part due to difficulties associated with some of the new techniques,
such as additional hyperparameters that need to be tuned, high computational cost, and varying effectiveness across domains.
Consequently, scientists who analyze large high dimensional datasets with unknown distribution must maintain and apply multiple different clustering algorithms
in the hope that one will succeed.
\subsection{Contribution}

We propose a fast, easy-to-use, high-dimensional and effective clustering algorithm.
The algorithm uses standard numerical methods to optimize explicit continuous targets that can be extended to massive data sets.
The number of clusters does not need to be known in advance.
One of the features of the proposed formula is to reduce the clustering to an optimized continuous target.
This allows the cluster to be integrated into the end-to-end functional learning pipeline.
We demonstrate this by extending RCC to perform joint clustering and dimensionality reduction.
An extended algorithm called RCC-DR learns to embed data into the low dimensional space of its cluster.
Embedding and clustering are performed jointly by algorithms that optimize explicit global goals.
We evaluate RCC and RCC-DR on a large set of data from various fields.
These include image data sets, file data sets, space shuttle sensor reading data sets,
and mouse protein expression level data sets.
Experiments show that our method is significantly better than the previous state of the art.
The RCC DR is particularly robust in data sets from different areas,
with an average ranking three times better than the best prior algorithm.

\subsection{Summary}
First,the authors points clustering is the basic experimental procedure for data analysis.
It is used in almost all natural and social sciences and plays a central role in biology, astronomy, psychology, medicine, and chemistry.
But such center based methods, including sensitivity to initialization, limited effectiveness in high dimensional spaces,
and the requirement that the number of clusters be set in advance.
The endurance of these methods is in part due to their simplicity and in part due to difficulties associated with some of the new techniques,
such as additional hyperparameters that need to be tuned, high computational cost, and varying effectiveness across domains.

Second,the authors give us a clustering model which can be a effective and fast cross-domain solution to the problem.And the algorithm can alse use to solve massive data sets problem.
The number of clusters does not need to be known in advance.
The objective optimization is to reduce the clustering to an optimized continuous target. It can be optimized by any gradient-based method.
The vectorization process is very helpful for machine learning.RCC-DR inherits the appealing properties of RCC.Clustering
and dimensionality reduction are performed jointly by optimizing a clear continuous objective, the framework supports nonconvex robust estimators that can untangle mixed clusters, and
optimization is performed by efficient and scalable numerical methods.

Third,the author experimented with multiple datasets to prove that the RCC and RCC-DR algorithms are superior to existing methods.
\subsection{My Idea}

Robust linear optimization solves the optimal solution in the case of data interference. It differs from the general linear optimization problem in that its data is not a deterministic value, but a set of values that will float in a certain interval, but in this case we can still find an optimal solution, making it The data applicable to these disturbances means that our solution is robust.

Robust continuous clustering formulation is based on recent convex relaxations for clustering. However, the objective is deliberately not convex. We use redescending robust estimators that allow even heavily mixed clusters to be untangled by optimizing a single continuous objective. Despite the nonconvexity of the objective, the optimization can still be performed using standard linear least-squares solvers, which are highly efficient and scalable. Since the algorithm expresses clustering as optimization of a continuous objective based on robust estimation, we call it robust continuous clustering (RCC).

posted @ 2018-06-21 15:02  馥泽泽  阅读(608)  评论(0编辑  收藏  举报