spark中自定义多维度排序
在spark中,原始的sortByKey是以map为操作对象,按照key进行排序,value跟随
如果我们要设置多维排序,就需要自定义这个key对象
下面以三维度为例:
该class需要 extends Ordered[T] with Serializable , 然后将这个类的对象作为sortByKey的第一个key参数,进行sort
val conf = new SparkConf()
conf.setAppName("thirdSort")
conf.setMaster("local")
val sc = new SparkContext(conf)
val lines = sc.textFile("d:/third.txt")
val a = lines.map(line => (new ThirdOrderKey(line.split(" ")(0).toInt,line.split(" ")(1).toInt,line.split(" ")(2).toInt),line)) // 将每行的key分别装入到ThirdOrderKey对象
a.sortByKey(false).map(x=>x._2).collect.foreach (println) // sortByKey的结果会自动添加一行key,结果是value
下面是ThirdOrderKey的定义,2个关键点,一个是extends Ordered,另外一个是实现compare方法
class ThirdOrderKey(val first:Int,val second:Int,val thrid:Int) extends Ordered[ThirdOrderKey] with Serializable { def compare(other:ThirdOrderKey):Int ={ if(this.first-other.first!=0) { this.first-other.first } else if(this.second - other.second !=0) { this.second-other.second } else { this.thrid - other.thrid } } }