spark的那些坑
申明:所有环境均在本地
<spark-streaming>
1. 在本地运行读取kafka的时候
spark.master should be set as local[n], n > 1 in local mode if you have receivers to get data,
otherwise Spark jobs will not get resources to process the received data.
2. action的执行时间: requirement failed: No output operations registered, so nothing to execute
只有包含action方法才会被真正执行,执行方式懒加载.具体有reduce(),collect(),count(),first(),take()
saveAsTextFile(path),foreach(),countByKey()等...
3. 如果从kafka读不到消息,则不会处理kafkaStream相关的方法.直接进入下一步.
4. 读取kafka消息的两种方法:
@1 Receiver-based Approach 通过
KafkaUtils.createStream().不能控制处理消息的并行度.only one receiver.
@2 Direct Approach 通过
KafkaUtils.createDirectStream()创建.好处Simplified Parallelism(提供消息处理并行度)
今天暂时先到这儿...