网约车的车辆热点聚类1

 

类似Uber,需要处理处不同时间段的不同地区的订单热点区域,帮助进行网约车的及时调度处理

每个成交单Order中,都会有订单上车位置的起始经纬度:open_lat , open_lng

那么在这个时间段内,哪些地区是高密集订单区域,好进行及时的调度,所以需要得到不同地区的热力图

初期想法是基于经纬度做聚类操作,典型的聚类算法是K- means,一种基于层次的聚类操作:

 

但是注意,kmeans是不适合做车辆聚类的,因为未来哪些地方会有订单其实是位置的,而kmeans要求必须制定K值,这就相当于,我最终要分成多少个聚类,显示不合适;

因此,想到了基于密度的聚类,而且不需要制定类别数,还可以自动识别噪点的DBScan算法

dbscan算法的思路:

输入:样本集D=(x1,x2,...,xm),邻域参数(ϵ,MinPts), 样本距离度量方式

    输出: 簇划分C. 

    1)初始化核心对象集合Ω=∅, 初始化聚类簇数k=0,初始化未访问样本集合Γ = D,  簇划分C = ∅
    2) 对于j=1,2,...m, 按下面的步骤找出所有的核心对象:

      a) 通过距离度量方式,找到样本xj的ϵ-邻域子样本集Nϵ(xj)
      b) 如果子样本集样本个数满足|Nϵ(xj)|≥MinPts, 将样本xj加入核心对象样本集合:Ω=Ω∪{xj}
    3)如果核心对象集合Ω=∅,则算法结束,否则转入步骤4.

    4)在核心对象集合Ω中,随机选择一个核心对象o,初始化当前簇核心对象队列Ωcur={o}, 初始化类别序号k=k+1,初始化当前簇样本集合Ck={o}, 更新未访问样本集合Γ=Γ−{o}
    5)如果当前簇核心对象队列Ωcur=∅,则当前聚类簇Ck生成完毕, 更新簇划分C={C1,C2,...,Ck}, 更新核心对象集合Ω=Ω−Ck, 转入步骤3。

    6)在当前簇核心对象队列Ωcur中取出一个核心对象o′,通过邻域距离阈值ϵ找出所有的ϵ-邻域子样本集Nϵ(o′),令Δ=Nϵ(o′)∩Γ, 更新当前簇样本集合Ck=Ck∪Δ, 更新未访问样本集合Γ=Γ−Δ,  更新Ωcur=Ωcur∪(Δ∩Ω)−o′,转入步骤5.

    输出结果为: 簇划分C={C1,C2,...,Ck}

利用java代码实现dbscan:

package com.df.dbscan;
import java.util.ArrayList;

/**
 * Created by angel
 */
public class DBScan {
    private double radius;
    private int minPts;
    /**
     * @param radius 单位米
     * @param minPts 最小聚合数
     * */
    public DBScan(double radius,int minPts) {
        this.radius = radius;
        this.minPts = minPts;
    }

    public void process(ArrayList<Point> points) {
        int size = points.size();
        int idx = 0;
        int cluster = 1;
        while (idx<size) {
            Point p = points.get(idx++);
            //choose an unvisited point
            if (!p.getVisit()) {
                p.setVisit(true);//set visited
                ArrayList<Point> adjacentPoints = getAdjacentPoints(p, points);
                //set the point which adjacent points less than minPts noised
                if (adjacentPoints != null && adjacentPoints.size() < minPts) {
                    p.setNoised(true);
                } else {
                    p.setCluster(cluster);
                    for (int i = 0; i < adjacentPoints.size(); i++) {
                        Point adjacentPoint = adjacentPoints.get(i);
                        //only check unvisited point, cause only unvisited have the chance to add new adjacent points
                        if (!adjacentPoint.getVisit()) {
                            adjacentPoint.setVisit(true);
                            ArrayList<Point> adjacentAdjacentPoints = getAdjacentPoints(adjacentPoint, points);
                            //add point which adjacent points not less than minPts noised
                            if (adjacentAdjacentPoints != null && adjacentAdjacentPoints.size() >= minPts) {
                                //adjacentPoints.addAll(adjacentAdjacentPoints);
                                for (Point pp : adjacentAdjacentPoints){
                                    if (!adjacentPoints.contains(pp)){
                                        adjacentPoints.add(pp);
                                    }
                                }
                            }
                        }
                        //add point which doest not belong to any cluster
                        if (adjacentPoint.getCluster() == 0) {
                            adjacentPoint.setCluster(cluster);
                            //set point which marked noised before non-noised
                            if (adjacentPoint.getNoised()) {
                                adjacentPoint.setNoised(false);
                            }
                        }
                    }
                    cluster++;
                }
            }
            if (idx%1000==0) {
                System.out.println(idx);
            }
        }
    }

    private ArrayList<Point> getAdjacentPoints(Point centerPoint,ArrayList<Point> points) {
        ArrayList<Point> adjacentPoints = new ArrayList<Point>();
        for (Point p:points) {
            //include centerPoint itself
            double distance = centerPoint.GetDistance(p);
            if (distance<=radius) {
                adjacentPoints.add(p);
            }
        }
        return adjacentPoints;
    }

}
View Code

我的处理方式:

所以,我只需要将数据从Hbase中查询出来,在封装好具体的需要数据,就可以推送到算法中,最后识别出结果

//查询Hbase操作
 val result = Controll.rowEndFilter2(tableName, startDate, endDate)
//将查询出来的数据组装成算法需要的结构
import scala.collection.JavaConversions._
    for (map <- result) {
      val lon = map.get("open_lng")
      val lat = map.get("open_lat")
      val begin_address_code = map.get("begin_address_code")
      points.add(new Point(lat.toDouble, lon.toDouble,begin_address_code))
    }
//算法处理
val dbScan = new DBScan(radius, density)
dbScan.process(points)
//将java的list转成scala的list
val point_List: List[Point] = JavaConverters.asScalaIteratorConverter(points.iterator()).asScala.toList
//得到每一个族下的坐标系
val groupData: Map[Int, List[Point]] = point_List.groupBy(line => line.getCluster)
//在将结果进一步处理发送出去即可

 

posted @ 2019-04-27 20:52  niutao  阅读(1141)  评论(0编辑  收藏  举报