Intro to DBSCAN
DBSCAN
- Density-Based Spatial Clustering of Application with Noise
-
It can discover cluster of arbitrary shape
-
A cluster is defined as a maximal set of density-connected points
-
Two parameters
- Eps: Maximun radius of the neighbourhood
- MinPts: Minimum number of points in the Eps-Neighbourhood of a point.
-
Suppose we have a point q, with the pre-determined parameters. If the number of neighbourhood within the Eps, which is
, is larger than the value of MinPts, we say this point is a core.
-
Three types of points
- Core point: dense neighborhood
- Border point: neighbourhood is not dense(
less than MinPts) but in the cluster, or can be reached by other cluster(direct density reachable from a core point)
- Noise/Outlier: not in a cluster and also cannot be reached by other cluster.
-
Directly density-reachable: A point p is directly density-reachable from q if:
- p belongs to
- q itself is a core point:
- p belongs to
-
Density-reachable
A point p is density-reachable from a point q if there is a chain of points p1,...pn, s.t p1=q, pn=p and pi+1 is directly density-reachable from pi
-
Density-connected
A point is density-connected to a point q if there is a point o such that both p and q are density-reachable from o. Even if both p and q can be a border, they could be in the same cluster as long as there is a point o that it is density-reachable to p and q.
Algorithm
- Arbitrarily select a point p.
-
Retrieve all points density-reachable from p under the constrain of Eps and MinPts.
- if p is a core point, a cluster is formed that the border is also found.
- if p is a border, no points are density-reachable from p. Then p is a noise or outlier, DBSCAN just skips to the next point.
-
Continue the process until all the points have been processed.
But DBSCAN is sensitive to the setting of Eps and MinPts.