摘要: When I write PySpark code, I use Jupyter notebook to test my code before submitting a job on the cluster. In this post, I will show you how to install 阅读全文
posted @ 2019-04-14 18:03 chenxiangzhen 阅读(2530) 评论(0) 推荐(0) 编辑
摘要: 特征工程 对连续值处理 0.binarizer/二值化 Binarizer output with Threshold = 5.100000 + + + + | id|feature|binarized_feature| + + + + | 0| 1.1| 0.0| | 1| 8.5| 1.0| | 阅读全文
posted @ 2019-04-14 16:59 chenxiangzhen 阅读(421) 评论(1) 推荐(0) 编辑