SQLContext、HiveContext自定义函数注册
本文简单介绍两种往SQLContext、HiveContext中注册自定义函数方法。
下边以sqlContext为例,在spark-shell下操作示例:
scala> sc res5: org.apache.spark.SparkContext = org.apache.spark.SparkContext@35d4035f scala> sqlContext res7: org.apache.spark.sql.SQLContext = org.apache.spark.sql.hive.HiveContext@171b0d3 scala> val df = sc.parallelize(Seq(("张三", 25), ("李四", 30),("赵六", 27))).toDF("name", "age") df: org.apache.spark.sql.DataFrame = [name: string, age: int] scala> df.registerTempTable("emp") 1)外部定义函数: scala> def remainWorkYears(age: Int) : Int = { | 60 - age | } remainWorkYears: (age: Int)Int scala> sqlContext.udf.register("remainWorkYears", remainWorkYears _) res1: org.apache.spark.sql.UserDefinedFunction = UserDefinedFunction(<function1>,IntegerType,List()) scala> sqlContext.sql("select e.*, remainWorkYears(e.age) as remainedWorkYear from emp e").show hiveContext.sql("select e.*, remainWorkYears(e.age) as remainedWorkYear from emp e").show +----+---+----------------+ |name|age|remainedWorkYear| +----+---+----------------+ | 张三| 25| 35| | 李四| 30| 30| | 赵六| 27| 33| +----+---+----------------+ 2)匿名函数: scala> sqlContext.udf.register("remainWorkYears_anoymous", (age: Int) => { | 60 - age | }) res3: org.apache.spark.sql.UserDefinedFunction = UserDefinedFunction(<function1>,IntegerType,List()) scala> sqlContext.sql("select e.*, remainWorkYears_anoymous(e.age) as remainedWorkYear from emp e").show +----+---+----------------+ |name|age|remainedWorkYear| +----+---+----------------+ | 张三| 25| 35| | 李四| 30| 30| | 赵六| 27| 33| +----+---+----------------+
基础才是编程人员应该深入研究的问题,比如:
1)List/Set/Map内部组成原理|区别
2)mysql索引存储结构&如何调优/b-tree特点、计算复杂度及影响复杂度的因素。。。
3)JVM运行组成与原理及调优
4)Java类加载器运行原理
5)Java中GC过程原理|使用的回收算法原理
6)Redis中hash一致性实现及与hash其他区别
7)Java多线程、线程池开发、管理Lock与Synchroined区别
8)Spring IOC/AOP 原理;加载过程的。。。
【+加关注】。