dask集群搭建简介(一)

介绍

Dask本质上由两部分构成:动态计算调度、集群管理,高级Dataframe api模块;类似于spark与pandas。Dask内部实现了分布式调度,无需用户自行编写复杂的调度逻辑和程序,通过简单的方法实现了分布式计算,支持部分模型并行处理(例如分部署算法:xgboost、LR、sklearn等)。Dask 专注于数据科学领域,与Pandas非常接近,但并不完全兼容。

集群搭建:

在Dask集群中,存在多种角色:client,scheduler, worker

  1. client: 用于客户client与集群之间的交互
  2. scheduler:主节点(集群的注册中心)管理点,负责client提交的任务管理,以不同策略分发不同worker节点
  3. worker:工作节点,受scheduler管理,负责数据计算
1. 主节点(scheduler):
  1. scheduler:默认端口8786
    a. 依赖包:dask、distributed
    b. 安装:pip install dask distributed
    c. 启动:

    dask-scheduler

distributed.scheduler - INFO - -----------------------------------------------
distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at:  tcp://192.168.1.21:8786
distributed.scheduler - INFO -   dashboard at:                     :8787
  1. web UI:默认端口8787
    a. web 登录提示:需要安装依赖项( bokeh )
    b. 安装:pip install bokeh>=0.13.0
    c. 界面效果:
2. 工作节点(worker):

a. 依赖包:dask、distributed
b. 安装:pip install dask distributed
c. 启动:以192.168.1.22 为例,192.168.1.23雷同
> dask-worker 192.168.1.21:8786

distributed.nanny - INFO -         Start Nanny at: 'tcp://192.168.1.22:36803'
distributed.worker - INFO -       Start worker at:  tcp://192.168.1.22:37089
distributed.worker - INFO -          Listening to:  tcp://192.168.1.22:37089
distributed.worker - INFO -          dashboard at:        192.168.1.22:36988
distributed.worker - INFO - Waiting to connect to:   tcp://192.168.1.21:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                         24
distributed.worker - INFO -                Memory:                   33.52 GB
distributed.worker - INFO -       Local Directory: /home/binger/dask-server/dask-worker-space/worker-ntrdwzqp
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -         Registered to:   tcp://192.168.1.21:8786
distributed.worker - INFO - -------------------------------------------------
distributed.core - INFO - Starting established connection

主节点变化:

distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at:  tcp://192.168.1.21:8786
distributed.scheduler - INFO -   dashboard at:                     :8787
distributed.scheduler - INFO - Register worker <Worker 'tcp:/192.168.1.22:37089', name: tcp://192.168.1.22:37089, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://192.168.1.22:37089
distributed.core - INFO - Starting established connection
3. dask-scheduler 启动失败:ValueError: 'default' must be a list when 'multiple' is true.
Traceback (most recent call last):
  File "D:\Program Files\Python36\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "D:\Program Files\Python36\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "E:\workspace\ceshi\venv\Scripts\dask-scheduler.exe\__main__.py", line 4, in <module>
  File "e:\workspace\ceshi\venv\lib\site-packages\distributed\cli\dask_scheduler.py", line 122, in <module>
    @click.version_option()
  File "e:\workspace\ceshi\venv\lib\site-packages\click\decorators.py", line 247, in decorator
    _param_memo(f, OptionClass(param_decls, **option_attrs))
  File "e:\workspace\ceshi\venv\lib\site-packages\click\core.py", line 2465, in __init__
    super().__init__(param_decls, type=type, multiple=multiple, **attrs)
  File "e:\workspace\ceshi\venv\lib\site-packages\click\core.py", line 2101, in __init__
    ) from None
ValueError: 'default' must be a list when 'multiple' is true.

解决办法:修改click 版本<8.0

pip install "click>=7,<8"

posted @ 2022-06-07 12:55  binger0712  阅读(1064)  评论(0编辑  收藏  举报