dremio yarn 测试环境部署
以前我简单写过关于dremio yarn 运行的说明(开发上基于了Twill框架,当创建基于yarn 的引擎的时候会进行dremio executor 的打包,放到hfds 中,之后基于yarn 的调度运行),以下是一个简单的基于docker 的部署环境,方便学习
环境
- docker-compose
version: "3"
services:
zk:
image: zookeeper
ports:
- 2181:2181
namenode:
image: apache/hadoop:3
hostname: namenode
command: ["hdfs", "namenode"]
ports:
- 9870:9870
env_file:
- ./config
environment:
ENSURE_NAMENODE_DIR: "/tmp/hadoop-root/dfs/name"
datanode:
image: apache/hadoop:3
command: ["hdfs", "datanode"]
env_file:
- ./config
resourcemanager:
image: apache/hadoop:3
hostname: resourcemanager
command: ["yarn", "resourcemanager"]
ports:
- 8088:8088
env_file:
- ./config
volumes:
- ./test.sh:/opt/test.sh
nodemanager:
image: apache/hadoop:3
hostname: nodemanager
command: ["yarn", "nodemanager"]
env_file:
- ./config
ports:
- 8042:8042
nodemanagerv2:
image: apache/hadoop:3
hostname: nodemanagerv2
command: ["yarn", "nodemanager"]
env_file:
- ./config
ports:
- 8043:8042
mysql:
image: mysql:5.6
command: --character-set-server=utf8
ports:
- "3308:3306"
environment:
- MYSQL_ROOT_PASSWORD=dalong
- MYSQL_USER=boss
- MYSQL_DATABASE=boss
- MYSQL_PASSWORD=dalong
minio:
image: minio/minio
ports:
- "9000:9000"
- "19001:19001"
environment:
MINIO_ACCESS_KEY: minio
MINIO_SECRET_KEY: minio123
command: server --console-address :19001 --quiet /data
dremio:
build: .
environment:
- name=value
ports:
- "9047:9047"
- "31010:31010"
- "9090:9090"
pg:
image: postgres:16.0
ports:
- "5432:5432"
environment:
- POSTGRES_PASSWORD=dalongdemo
nessie:
image: projectnessie/nessie:0.75.0-java
environment:
- NESSIE_VERSION_STORE_TYPE=JDBC
- QUARKUS.DATASOURCE.USERNAME=postgres
- QUARKUS.DATASOURCE.PASSWORD=dalongdemo
- QUARKUS_DATASOURCE_JDBC_URL=jdbc:postgresql://pg:5432/postgres
ports:
- "19120:19120"
- "19121:19121"
简单说明:
hadoop yarn 部署基于了apache hadoop 3 docker 镜像,使用了配置环境变量
hadoop 镜像支持基于环境变量的配置生成,可以方便配置zk 相关信息
添加引擎部分,主要试试资源管理器以及namenode (hdfs 的信息)
- config
CORE-SITE.XML_fs.default.name=hdfs://namenode
CORE-SITE.XML_fs.defaultFS=hdfs://namenode
CORE-SITE.XML_ha.zookeeper.quorum=zk:2181
CORE-SITE.XML_ha.zookeeper.session-timeout.ms=10000
CORE-SITE.XML_hadoop.proxyuser.dremio.hosts=*
CORE-SITE.XML_hadoop.proxyuser.dremio.groups=*
CORE-SITE.XML_hadoop.proxyuser.dremio.users=*
HDFS-SITE.XML_dfs.namenode.rpc-address=namenode:8020
HDFS-SITE.XML_dfs.replication=1
MAPRED-SITE.XML_mapreduce.framework.name=yarn
MAPRED-SITE.XML_yarn.app.mapreduce.am.env=HADOOP_MAPRED_HOME=$HADOOP_HOME
MAPRED-SITE.XML_mapreduce.map.env=HADOOP_MAPRED_HOME=$HADOOP_HOME
MAPRED-SITE.XML_mapreduce.reduce.env=HADOOP_MAPRED_HOME=$HADOOP_HOME
YARN-SITE.XML_yarn.resourcemanager.hostname=resourcemanager
YARN-SITE.XML_yarn.nodemanager.pmem-check-enabled=false
YARN-SITE.XML_yarn.nodemanager.delete.debug-delay-sec=600
YARN-SITE.XML_yarn.resourcemanager.zk-address=zk:2181
YARN-SITE.XML_yarn.nodemanager.vmem-check-enabled=false
YARN-SITE.XML_yarn.nodemanager.aux-services=mapreduce_shuffle
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-applications=10000
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-am-resource-percent=0.1
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.queues=default
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.capacity=100
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.user-limit-factor=1
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.maximum-capacity=100
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.state=RUNNING
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_submit_applications=*
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_administer_queue=*
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.node-locality-delay=40
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings=
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings-override.enable=false
一些界面信息
nodemanager
hdfs
说明
启动之后dremio 依赖的zk (hdfs 集群配置的),会注册executor 相关节点的信息,方便后续dremio 执行器选择节点,目前dremio 对于执行器的默认内存要求为16g(目前前端硬编码的),所以需要一个配置比较大的docker 环境,否则环境比较难运行起来,推荐还是自己部署下,可以更好的了解dremio yarn 的运行机制, 相关文件我已经push 到github 了,可以参考
参考资料
https://github.com/rongfengliang/dremio-yarn-docker-learning
https://docs.dremio.com/current/get-started/cluster-deployments/deployment-models/yarn-hadoop
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html#Configurations
https://github.com/apache/hadoop/tree/docker-hadoop-runner-latest
https://github.com/apache/hadoop/tree/docker-hadoop-3
https://hub.docker.com/r/apache/hadoop
https://www.cnblogs.com/rongfengliang/p/15937838.html
https://www.cnblogs.com/rongfengliang/p/15938868.html
https://www.cnblogs.com/rongfengliang/p/17081114.html
https://www.cnblogs.com/rongfengliang/p/17092429.html
https://www.cnblogs.com/rongfengliang/p/17093049.html
https://www.cnblogs.com/rongfengliang/p/17091307.html
https://www.cnblogs.com/rongfengliang/p/17092529.html