dremio yarn 测试环境部署

以前我简单写过关于dremio yarn 运行的说明(开发上基于了Twill框架,当创建基于yarn 的引擎的时候会进行dremio executor 的打包,放到hfds 中,之后基于yarn 的调度运行),以下是一个简单的基于docker 的部署环境,方便学习

环境

  • docker-compose
version: "3"
services:
  zk:
    image: zookeeper
    ports:
      - 2181:2181
  namenode:
    image: apache/hadoop:3
    hostname: namenode
    command: ["hdfs", "namenode"]
    ports:
      - 9870:9870
    env_file:
      - ./config
    environment:
      ENSURE_NAMENODE_DIR: "/tmp/hadoop-root/dfs/name"
  datanode:
    image: apache/hadoop:3
    command: ["hdfs", "datanode"]
    env_file:
      - ./config
  resourcemanager:
    image: apache/hadoop:3
    hostname: resourcemanager
    command: ["yarn", "resourcemanager"]
    ports:
      - 8088:8088
    env_file:
      - ./config
    volumes:
      - ./test.sh:/opt/test.sh
  nodemanager:
    image: apache/hadoop:3
    hostname: nodemanager
    command: ["yarn", "nodemanager"]
    env_file:
      - ./config
    ports:
      - 8042:8042
  nodemanagerv2:
    image: apache/hadoop:3
    hostname: nodemanagerv2
    command: ["yarn", "nodemanager"]
    env_file:
      - ./config
    ports:
      - 8043:8042
  mysql:
    image: mysql:5.6
    command: --character-set-server=utf8
    ports:
      - "3308:3306"
    environment:
      - MYSQL_ROOT_PASSWORD=dalong
      - MYSQL_USER=boss
      - MYSQL_DATABASE=boss
      - MYSQL_PASSWORD=dalong
  minio:
    image: minio/minio
    ports:
      - "9000:9000"
      - "19001:19001"
    environment:
      MINIO_ACCESS_KEY: minio
      MINIO_SECRET_KEY: minio123
    command: server --console-address :19001 --quiet /data
  dremio:
    build: .
    environment:
      - name=value
    ports:
      - "9047:9047"
      - "31010:31010"
      - "9090:9090"
  pg:
    image: postgres:16.0
    ports:
      - "5432:5432"
    environment:
      - POSTGRES_PASSWORD=dalongdemo
  nessie:
    image: projectnessie/nessie:0.75.0-java
    environment:
      - NESSIE_VERSION_STORE_TYPE=JDBC
      - QUARKUS.DATASOURCE.USERNAME=postgres
      - QUARKUS.DATASOURCE.PASSWORD=dalongdemo
      - QUARKUS_DATASOURCE_JDBC_URL=jdbc:postgresql://pg:5432/postgres
    ports:
      - "19120:19120"
      - "19121:19121"

简单说明:
hadoop yarn 部署基于了apache hadoop 3 docker 镜像,使用了配置环境变量
hadoop 镜像支持基于环境变量的配置生成,可以方便配置zk 相关信息
添加引擎部分,主要试试资源管理器以及namenode (hdfs 的信息)

  • config
 
CORE-SITE.XML_fs.default.name=hdfs://namenode
CORE-SITE.XML_fs.defaultFS=hdfs://namenode
CORE-SITE.XML_ha.zookeeper.quorum=zk:2181
CORE-SITE.XML_ha.zookeeper.session-timeout.ms=10000
CORE-SITE.XML_hadoop.proxyuser.dremio.hosts=*
CORE-SITE.XML_hadoop.proxyuser.dremio.groups=*
CORE-SITE.XML_hadoop.proxyuser.dremio.users=*
HDFS-SITE.XML_dfs.namenode.rpc-address=namenode:8020
HDFS-SITE.XML_dfs.replication=1
MAPRED-SITE.XML_mapreduce.framework.name=yarn
MAPRED-SITE.XML_yarn.app.mapreduce.am.env=HADOOP_MAPRED_HOME=$HADOOP_HOME
MAPRED-SITE.XML_mapreduce.map.env=HADOOP_MAPRED_HOME=$HADOOP_HOME
MAPRED-SITE.XML_mapreduce.reduce.env=HADOOP_MAPRED_HOME=$HADOOP_HOME
YARN-SITE.XML_yarn.resourcemanager.hostname=resourcemanager
YARN-SITE.XML_yarn.nodemanager.pmem-check-enabled=false
YARN-SITE.XML_yarn.nodemanager.delete.debug-delay-sec=600
YARN-SITE.XML_yarn.resourcemanager.zk-address=zk:2181
YARN-SITE.XML_yarn.nodemanager.vmem-check-enabled=false
YARN-SITE.XML_yarn.nodemanager.aux-services=mapreduce_shuffle
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-applications=10000
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-am-resource-percent=0.1
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.queues=default
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.capacity=100
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.user-limit-factor=1
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.maximum-capacity=100
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.state=RUNNING
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_submit_applications=*
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_administer_queue=*
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.node-locality-delay=40
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings=
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings-override.enable=false

一些界面信息

nodemanager


hdfs

说明

启动之后dremio 依赖的zk (hdfs 集群配置的),会注册executor 相关节点的信息,方便后续dremio 执行器选择节点,目前dremio 对于执行器的默认内存要求为16g(目前前端硬编码的),所以需要一个配置比较大的docker 环境,否则环境比较难运行起来,推荐还是自己部署下,可以更好的了解dremio yarn 的运行机制, 相关文件我已经push 到github 了,可以参考

参考资料

https://github.com/rongfengliang/dremio-yarn-docker-learning
https://docs.dremio.com/current/get-started/cluster-deployments/deployment-models/yarn-hadoop
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html#Configurations
https://github.com/apache/hadoop/tree/docker-hadoop-runner-latest
https://github.com/apache/hadoop/tree/docker-hadoop-3
https://hub.docker.com/r/apache/hadoop
https://www.cnblogs.com/rongfengliang/p/15937838.html
https://www.cnblogs.com/rongfengliang/p/15938868.html
https://www.cnblogs.com/rongfengliang/p/17081114.html
https://www.cnblogs.com/rongfengliang/p/17092429.html
https://www.cnblogs.com/rongfengliang/p/17093049.html
https://www.cnblogs.com/rongfengliang/p/17091307.html
https://www.cnblogs.com/rongfengliang/p/17092529.html

posted on 2024-01-29 08:02  荣锋亮  阅读(31)  评论(0编辑  收藏  举报

导航