SparkSQL UDF两种注册方式:udf() 和 register()

调用sqlContext.udf.register()

此时注册的方法 只能在sql()中可见,对DataFrame API不可见

用法:sqlContext.udf.register("makeDt", makeDT(_:String,_:String,_:String))

示例:

def makeDT(date: String, time: String, tz: String) = s"$date $time $tz"
sqlContext.udf.register("makeDt", makeDT(_:String,_:String,_:String))
 
// Now we can use our function directly in SparkSQL.
sqlContext.sql("SELECT amount, makeDt(date, time, tz) from df").take(2)
// but not outside
df.select($"customer_id", makeDt($"date", $"time", $"tz"), $"amount").take(2) // fails

2)调用spark.sql.function.udf()方法

此时注册的方法,对外部可见

用法:valmakeDt = udf(makeDT(_:String,_:String,_:String))

示例:

import org.apache.spark.sql.functions.udf
val makeDt = udf(makeDT(_:String,_:String,_:String))
// now this works
df.select($"customer_id", makeDt($"date", $"time", $"tz"), $"amount").take(2)

 

posted @ 2018-07-21 17:45  大葱拌豆腐  阅读(9329)  评论(0编辑  收藏  举报