Flink入门(二)
接上一篇Flink入门(一)WordCount,Flink得三种运行模式Stand alone,YARN,K8S。我就不多赘述了,主要说一下Flink On YARN
简单粗暴分布讲解:
1. 保证HDFS,YARN集群均开启得前提下,开启Flink得yarn-session
hadoop@hadoop1:/opt/flink-1.10.1/bin$ ./yarn-session.sh -n 2 -s 2 -jm 1024 -tm 1024 -nm test -d
上面的参数详情我简单贴一下吧:
-n(--container): TaskManager的数量。 -s(--slots): 每个 TaskManager的slot 数量, 默认一个slot一个 core, 默认每个taskmanager 的 slot 的个数为 1, 有时可以多一些 taskmanager, 做冗余。 -jm: JobManager的内存(单位MB)。 -tm: 每个taskmanager的内存(单位MB)。 -nm: yarn的appName(现在yarn的ui上的名字)。 -d: 后台执行。
2. 提交Flink任务,跟Stand alone没有任何区别,就直接启动就完事儿了,命令行为例
hadoop@hadoop1:/opt/flink-1.10.1$ ./bin/flink run -c WordCount_Streaming \
-p 2 \
flinktestlearn-1.0-SNAPSHOT-jar-with-dependencies.jar \
--host localhost --port 8765
2020-09-02 16:17:21,853 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found Yarn properties file under /tmp/.yarn-properties-hadoop.
2020-09-02 16:17:21,853 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found Yarn properties file under /tmp/.yarn-properties-hadoop.
2020-09-02 16:17:22,478 WARN org.apache.flink.yarn.cli.FlinkYarnSessionCli - The configuration directory ('/opt/flink-1.10.1/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2020-09-02 16:17:22,478 WARN org.apache.flink.yarn.cli.FlinkYarnSessionCli - The configuration directory ('/opt/flink-1.10.1/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2020-09-02 16:17:26,304 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop1/192.168.6.21:8032
2020-09-02 16:17:26,507 INFO org.apache.flink.yarn.YarnClusterDescriptor - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2020-09-02 16:17:26,517 WARN org.apache.flink.yarn.YarnClusterDescriptor - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set.The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN.
2020-09-02 16:17:26,608 INFO org.apache.flink.yarn.YarnClusterDescriptor - Found Web Interface hadoop2:39620 of application 'application_1596791033658_0003'.
Job has been submitted with JobID 909ef10669c8ed6fea50fabd947481ac
如果没什么问题打印的日志就是上面得样子,标红得就是yarn得applicationID,当然了,既然是wordcount,总要有输出看得见结果得地方,往下看:
进ResourceManager-host:8088之后,找到你的applicationID点进去如下图
继续点击Attmpt ID,然后就会看到下图,两个containerID,这两个一个是JobManager,另一个是TaskManager,TaskManager是真正输出数据得地方,找到标准输出得日志,就能看到结果了如下:
以上就是最简单得wordcount用Flink on YARN的演示,至于中间遇到的问题,当然有,往下看:
启动YARN-Session的时候遇到如下问题:、
1. Error: A JNI error has occurred, please check your installation and try again
hadoop@hadoop1:/opt/flink-1.10.1/bin$ ./yarn-session.sh -n 2 -s 2 -jm 1024 -tm 1024 -nm test -d Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.exceptions.YarnException at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more hadoop@hadoop1:/opt/flink-1.10.1/bin$
点击这里(提取码pb4y)下载安装包,丢进$FLINK_HOME/lib下,重新启动yarn-session就OK了,至于为什么,我还没研究