pyflink基础环境构建及任务运行

pyflink基础环境构建及任务运行

Flink #pyflink #部署

本地开发环境构建

准备

  • java 8或者 java11 已经可以使用
  • 本地拥有 python 或者 miniconda(建议),一下内容使用 conda管理虚拟环境
java -version

openjdk version "11" 2018-09-25
OpenJDK Runtime Environment 18.9 (build 11+28)
OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)

  • python 环境

主机安装 python, 或者 使用 conda虚拟环境

python --version

Python 3.10.8

conda 的安装及使用

# 激活虚拟环境
source ~/Documents/install/miniconda/bin/activate

# 创建 pyflink 虚拟环境
conda create --name py310_pyflink171_venv -y -q python=3.10.8
conda activate py310_pyflink171_venv
pip install --upgrade pip -i https://mirrors.aliyun.com/pypi/simple
pip install apache-flink==1.17.1 --no-cache-dir  -i https://mirrors.aliyun.com/pypi/simple --use-pep517

环境校验

访问:https://nightlies.apache.org/flink/flink-docs-release-1.16/api/python/examples/table/word_count.html 复制word_count代码

python word_count.py

控制台显示如下表示成功:

Executing word_count example with default input data set.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.flink.api.java.ClosureCleaner (file:/Users/faron/Documents/others/envs/py368_pyflink1161_test_venv/lib/python3.6/site-packages/pyflink/lib/flink-dist-1.16.1.jar) to field java.lang.String.value
WARNING: Please consider reporting this to the maintainers of org.apache.flink.api.java.ClosureCleaner
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
+I[To, 1]
+I[be,, 1]
+I[or, 1]
+I[not, 1]
+I[to, 1]
+I[be,--that, 1]
+I[is, 1]
+I[the, 1]
+I[question:--, 1]

任务提交服务器运行

以下命令都有指定 压缩后的虚拟环境,如果 flink 集群所在服务器上安装了 python+apache-flink,则无需再指定压缩虚拟环境

  • 打包运行环境

    # 找到 minconda(安装路径 envs目录下) 或者对应虚拟环境安装目录
    # 打包 py310_pyflink171_venv 虚拟环境
    cd ~/Documents/install/miniconda/env
    zip -r py310_pyflink171_venv.zip py310_pyflink171_venv
    

提交至 jobmanager

  • 单文件提交

    ./flink run \
    --jobmanager localhost:8081 \
    -pyarch file:///workplace/py310_pyflink171_venv.zip \
    -pyexec py310_pyflink171_venv.zip/py310_pyflink171_venv/bin/python3 \
    -pyclientexec py310_pyflink171_venv.zip/py310_pyflink171_venv/bin/python3 \
    -py /workplace/src/word_count.py
    
  • 带目录,指定入口模块提交

    ./flink run \
    --jobmanager localhost:8081 \
    -pyarch file:///workplace/py310_pyflink171_venv.zip \
    -pyexec py310_pyflink171_venv.zip/py310_pyflink171_venv/bin/python3 \
    -pyclientexec py310_pyflink171_venv.zip/py310_pyflink171_venv/bin/python3 \
    -pyfs /workplace/src \
    -pym word_count
    

提交至 yarn 集群管理

  • 提交运行

    • 本地 py虚拟环境

      ./flink run -m yarn-cluster \
      -pyarch file:///workplace/py310_pyflink171_venv.zip \
      -pyexec py310_pyflink171_venv.zip/py310_pyflink171_venv/bin/python3 \
      -pyclientexec py310_pyflink171_venv.zip/py310_pyflink171_venv/bin/python3 \
      -py word_count.py
      
    • hdfs py虚拟环境

      ./flink run  -m yarn-cluster \
      -pyarch hdfs://dae-ns/py_env/py310_pyflink171_venv.zip \
      -pyexec py310_pyflink171_venv.zip/py310_pyflink171_venv/bin/python3 \
      -pyclientexec py310_pyflink171_venv.zip/py310_pyflink171_venv \
      -py word_count.py
      
    • 带目录
      src

      ./bin/flink run-application -t yarn-application \
      -Dyarn.application.name=wordcount \
      -Dyarn.ship-files=/workplace/src \
      -pyarch shipfiles/py310_pyflink171_venv.zip \
      -pyclientexec py310_pyflink171_venv.zip/py310_pyflink171_venv/bin/python3 \
      -pyexec py310_pyflink171_venv.zip/py310_pyflink171_venv/bin/python3 \
      -pyfs src \
      -pym word_count
      

注意

  • 虚拟环境打包,该虚拟环境创建方式建议使用 conda,或者virtualenv --always-copy 方式创建,这样打的虚拟环境更全
  • 提交虚拟环境地址:py310_pyflink171_venv.zip/py310_pyflink171_venv 注意这个地址是双层

参考内容

Flink 学习网 PyFlink 作业的多种部署模式
Flink 文档提交方式
Flink 官方文档 python installer
Flink 官方文档 python word_count 示例
python 虚拟环境管理之 minconda

posted @ 2023-08-17 16:40  faronzz  阅读(373)  评论(0编辑  收藏  举报