dbscan

This is a pseudocode for DBSCAN from "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise".

And I have tried the pseudocode from the WIKI page. I prefer this.

ExpandCluster(SetOfPoints, Point, ClId, Eps, MinPts) : Boolean;
	seeds:=SetOfPoints.regionQuery(Point,Eps);
	IF seeds.size<MinPts THEN // no core point
		SetOfPoint.changeClId(Point,NOISE);
		RETURN False;
	ELSE	// all points in seeds are density-
			// reachable from Point
		SetOfPoints.changeClIds(seeds,ClId);
		seeds.delete(Point);
		WHILE seeds <> Empty DO
			currentP := seeds.first();
			result := SetOfPoints.regionQuery(currentP, Eps);
			IF result.size >= MinPts THEN
				FOR i FROM 1 TO result.size DO
					resultP := result.get(i);
					IF resultP.ClId IN {UNCLASSIFIED, NOISE} THEN
						IF resultP.ClId = UNCLASSIFIED THEN
							seeds.append(resultP);
						END IF;
						SetOfPoints.changeClId(resultP,ClId);
					END IF; // UNCLASSIFIED or NOISE
				END FOR;
			END IF; // result.size >= MinPts
			seeds.delete(currentP);
		END WHILE; // seeds <> Empty
		RETURN True;
	END IF
END; // ExpandCluster

DBSCAN (SetOfPoints, Eps, MinPts)
// SetOfPoints is UNCLASSIFIED
	ClusterId := nextId(NOISE);
	FOR i FROM 1 TO SetOfPoints.size DO
		Point := SetOfPoints.get(i);
		IF Point.ClId = UNCLASSIFIED THEN
			IF ExpandCluster(SetOfPoints, Point, ClusterId, Eps, MinPts) THEN
				ClusterId := nextId(ClusterId)
			END IF
		END IF
	END FOR
END; // DBSCAN

This can cluster the data based on density(or distance) consider noise, it's no complicated to implement.

I see that the most important thing that to get good result is to get good eps and minpts. and if your data is complicate the distance of two data point have so much influence on the result.

 

Good luck.

posted on 2014-06-05 10:11  很遗憾我不是  阅读(174)  评论(0编辑  收藏  举报