streaming优化:spark.default.parallelism调整处理并行度
官方是这么说的:
Cluster resources can be under-utilized if the number of parallel tasks used in any stage of the computation is not high enough. For example, for distributed reduce operations like reduceByKey and reduceByKeyAndWindow, the default number of parallel tasks is controlled by thespark.default.parallelism configuration property. You can pass the level of parallelism as an argument (see PairDStreamFunctions documentation), or set the spark.default.parallelism configuration property to change the default.
我理想:就是你可以调整spark.default.parallelism来修改默认并行度,或者在使用transformation,action方法时直接往方法传入并行度。