python dask 搭建分布式集群
1、分布式版本安装步骤
1.conda安装:conda install dask distributed-cconda-forge 2.pip 安装:pip install dask distributed --upgrade 3.source安装: git clone https://github.com/dask/distributed.git cd distributed python setup.py install
2、主节点启动方法
dask-scheduler
控制台显示信息如下:
distributed.scheduler - INFO - ----------------------------------------------- distributed.scheduler - INFO - Clear task state distributed.scheduler - INFO - Scheduler at: tcp://192.168.1.42:8786 distributed.scheduler - INFO - :8787 distributed.scheduler - INFO - Local Directory: C:\Users\User\AppData\Local\Temp\scheduler-gd9uk980 distributed.scheduler - INFO - -----------------------------------------------
3、工作节点启动方法
dask-worker 192.168.1.42:8786
工作节点启动成功后,此时主节点会显示多出信息: distributed.scheduler - INFO - Register tcp://192.168.1.184:45772 distributed.scheduler - INFO - Starting worker compute stream, tcp://192.168.1.184:45772 distributed.core - INFO - Starting established connection distributed.scheduler - INFO - Register tcp://192.168.1.183:43405 distributed.scheduler - INFO - Starting worker compute stream, tcp://192.168.1.183:43405 distributed.core - INFO - Starting established connection distributed.scheduler - INFO - Register tcp://192.168.1.188:38095 distributed.scheduler - INFO - Starting worker compute stream, tcp://192.168.1.188:38095 distributed.core - INFO - Starting established connection
4、官方测试代码
"""分布式dask""" import time from dask.distributed import Client client = Client('192.168.1.42:8786' ,asynchronous=True) ts = time.time() A = client.map(square, range(10000)) B = client.map(neg, A) total = client.submit(sum, B) print(total.result()) print('cost time :%s'%(time.time()-ts)) cost time :3.793848991394043
5、参考链接
dask官网地址:https://dask.org/
优势:dask内部自动实现了分布式调度、无需用户自行编写复杂的调度逻辑和程序;通过调用简单的方法就可以进行分布式计算、并支持部分模型的并行化处理;内部实现的分布式算法:xgboost、LR、sklearn的部分方法等
用一句话说:dask就是python版本的spark,是一个用Python 语言实现的分布式计算框架
作者:宇智波鼬_adb8
链接:https://www.jianshu.com/p/8ca5d70e0810?utm_campaign=haruki
来源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
自动化学习。