flink on yarn启动失败
我启动hadoop on yarn 集群后
[root@node1 flink-1.6.1]# ./bin/yarn-session.sh -n 2 -jm 1024 -tm 1024
报的如下错误
1 Caused by: org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. 2 2019-10-09 13:58:51,805 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli - Error while running the Flink Yarn session. java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1769) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:811) Caused by: org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:420) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:608) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:811) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) ... 2 more Caused by: org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. Diagnostics from YARN: Application application_1570641261952_0002 failed 1 times due to AM Container for appattempt_1570641261952_0002_000001 exited with exitCode: -103 For more detailed output, check application tracking page:http://node1:8088/cluster/app/application_1570641261952_0002Then, click on links to logs of each attempt. Diagnostics: Container [pid=9658,containerID=container_1570641261952_0002_01_000001] is running beyond virtual memory limits. Current usage: 91.7 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1570641261952_0002_01_000001 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 9673 9658 9658 9658 (java) 194 135 2187259904 23171 /home/hadoop/apps/jdk1.8.0_144/bin/java -Xmx424m -Dlog.file=/home/hadoop/apps/hadoop-2.7.2/logs/userlogs/application_1570641261952_0002/container_1570641261952_0002_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint |- 9658 9657 9658 9658 (bash) 0 0 115900416 305 /bin/bash -c /home/hadoop/apps/jdk1.8.0_144/bin/java -Xmx424m -Dlog.file=/home/hadoop/apps/hadoop-2.7.2/logs/userlogs/application_1570641261952_0002/container_1570641261952_0002_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint 1> /home/hadoop/apps/hadoop-2.7.2/logs/userlogs/application_1570641261952_0002/container_1570641261952_0002_01_000001/jobmanager.out 2> /home/hadoop/apps/hadoop-2.7.2/logs/userlogs/application_1570641261952_0002/container_1570641261952_0002_01_000001/jobmanager.err Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 Failing this attempt. Failing the application. If log aggregation is enabled on your cluster, use this command to further investigate the issue: yarn logs -applicationId application_1570641261952_0002 at org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:1072) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:542) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:413) ... 7 more 3
在其中找到
Diagnostics: Container [pid=9658,containerID=container_1570641261952_0002_01_000001] is running beyond virtual memory limits. Current usage: 91.7 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory used.
显然就是设置的内存超过了限制,
方法一:需要改小相应内存,或者增大虚拟机内存。
但是若flink内存设置过小又会报错
Caused by: org.apache.flink.util.FlinkException: Cannot fulfill the minimum memory requirements with the provided cluster specification. Please increase the memory of the cluster
这个就要依据大家自己的情况具体设置了
我经过几次尝试后设置为
[root@node1 flink-1.6.1]# ./bin/yarn-session.sh -n 2 -jm 900 -tm 900
方法二(推荐):
修改各个节点的etc/hadoop/yarn-site.xml
<property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property>
启动成功