搭建hwi和告警和日志信息监控
搭建hwi
1.通过网址https://archive.apache.org/dist/hive/hive-2.0.0/安装hive/hive-2.0.0包
[hadoop@master ~]$ wget https://archive.apache.org/dist/hive/hive-2.0.0/apache-hive-2.0.0-src.tar.gz
[hadoop@master opt]$ ls
apache-hive-2.0.0-src.tar.gz hadoop-2.7.1.tar.gz jdk-8u152-linux-x64.tar.gz
[root@master opt]# ls
apache-hive-2.0.0-src.tar.gz hadoop-2.7.1.tar.gz jdk-8u152-linux-x64.tar.gz
[root@master opt]# tar xf apache-hive-2.0.0-src.tar.gz
[root@master opt]# ls
apache-hive-2.0.0-src hadoop-2.7.1.tar.gz
apache-hive-2.0.0-src.tar.gz jdk-8u152-linux-x64.tar.gz
[root@master opt]# cd apache-hive-2.0.0-src
[root@master apache-hive-2.0.0-src]# ls
accumulo-handler conf hcatalog llap-client orc service
ant contrib hplsql llap-common packaging shims
beeline data hwi llap-server pom.xml spark-client
bin dev-support itests llap-tez ql storage-api
checkstyle docs jdbc metastore README.txt testutils
cli findbugs lib NOTICE RELEASE_NOTES.txt
common hbase-handler LICENSE odbc serde
[root@master apache-hive-2.0.0-src]# cd hwi
[root@master hwi]# pwd
/opt/apache-hive-2.0.0-src/hwi
[root@master hwi]# ls
pom.xml src web
[root@master hwi]# jar -Mcf hive-hwi-2.0.0.war -C web .
[root@master hwi]# ls
hive-hwi-2.0.0.war pom.xml src web
[root@master hwi]# file hive-hwi-2.0.0.war
hive-hwi-2.0.0.war: Zip archive data, at least v1.0 to extract
[root@master hwi]# chown -R hadoop.hadoop /usr/local/src/
[root@master hwi]# su - hadoop
Last login: Fri May 5 23:19:49 EDT 2023 on pts/0
[hadoop@master ~]$ cd /usr/local/src
[hadoop@master src]$ ll
total 8
drwxr-xr-x. 7 hadoop hadoop 178 Apr 18 22:22 flume
drwxr-xr-x. 13 hadoop hadoop 196 Mar 16 05:32 hadoop
drwxr-xr-x. 9 hadoop hadoop 183 Apr 6 05:29 hbase
drwxr-xr-x. 9 hadoop hadoop 170 Mar 22 23:29 hive
drwxr-xr-x. 8 hadoop hadoop 255 Sep 14 2017 jdk
drwxr-xr-x. 9 hadoop hadoop 4096 Dec 18 2017 sqoop
drwxr-xr-x. 12 hadoop hadoop 4096 Mar 28 21:34 zookeeper
[hadoop@master src]$ cd hive
[hadoop@master hive]$ cd conf/
[hadoop@master conf]$ ll
total 432
-rw-r--r--. 1 hadoop hadoop 1596 Jan 21 2016 beeline-log4j2.properties.template
-rw-r--r--. 1 hadoop hadoop 207525 Feb 9 2016 hive-default.xml.template
-rw-r--r--. 1 hadoop hadoop 2378 Apr 22 2015 hive-env.sh.template
-rw-r--r--. 1 hadoop hadoop 2287 Jan 21 2016 hive-exec-log4j2.properties.template
-rw-r--r--. 1 hadoop hadoop 2758 Jan 21 2016 hive-log4j2.properties.template
-rw-r--r--. 1 hadoop hadoop 207462 Mar 22 23:27 hive-site.xml
-rw-r--r--. 1 hadoop hadoop 2049 Jan 21 2016 ivysettings.xml
-rw-r--r--. 1 hadoop hadoop 3885 Jan 21 2016 llap-daemon-log4j2.properties.template
[hadoop@master conf]$ vi hive-site.xml
#修改
<property>
<name>hive.hwi.war.file</name>
<value>lib/hive-hwi-2.0.0.war</value>
<description>This sets the path to the HWI war file, relative to ${HIVE_HOME}. </description>
2.通过https://archive.apache.org/dist/ant/binaries/安装ant包
[hadoop@master opt]$ ls
apache-ant-1.9.1-bin.tar.gz apache-hive-2.0.0-src.tar.gz jdk-8u152-linux-x64.tar.gz
apache-hive-2.0.0-src hadoop-2.7.1.tar.gz
[hadoop@master opt]$ tar xf apache-ant-1.9.1-bin.tar.gz -C /usr/local/src
[hadoop@master opt]$ cd /usr/local/src
[hadoop@master src]$ ls
apache-ant-1.9.1 flume hadoop hbase hive jdk sqoop zookeeper
[hadoop@master src]$ ll
total 8
drwxr-xr-x. 6 hadoop hadoop 174 May 15 2013 apache-ant-1.9.1
drwxr-xr-x. 7 hadoop hadoop 178 Apr 18 22:22 flume
drwxr-xr-x. 13 hadoop hadoop 196 Mar 16 05:32 hadoop
drwxr-xr-x. 9 hadoop hadoop 183 Apr 6 05:29 hbase
drwxr-xr-x. 9 hadoop hadoop 170 Mar 22 23:29 hive
drwxr-xr-x. 8 hadoop hadoop 255 Sep 14 2017 jdk
drwxr-xr-x. 9 hadoop hadoop 4096 Dec 18 2017 sqoop
drwxr-xr-x. 12 hadoop hadoop 4096 Mar 28 21:34 zookeeper
[hadoop@master src]$ mv apache-ant-1.9.1 ant
[hadoop@master src]$ ls
ant flume hadoop hbase hive jdk sqoop zookeeper
[hadoop@master src]$
3.配置环境变量
[root@master hwi]# vi /etc/profile.d/ant.sh
#添加
export ANT_HOME=/usr/local/src/ant
export PATH=${ANT_HOME}/bin:$PATH
[root@master hwi]# su - hadoop
Last login: Tue May 9 04:39:37 EDT 2023 on pts/0
[hadoop@master ~]$ ant -version
Apache Ant(TM) version 1.9.1 compiled on May 15 2013
[hadoop@master ~]$
4.复制文件
[hadoop@master ~]$ cp /usr/local/src/jdk/lib/tools.jar /usr/local/src/hive/lib/
[hadoop@master ~]$ cp /usr/local/src/ant/lib/ant.jar /usr/local/src/hive/lib/
[hadoop@master ~]$ ll /usr/local/src/hive/lib/ant.jar
-rw-r--r--. 1 hadoop hadoop 1997485 May 9 05:16 /usr/local/src/hive/lib/ant.jar
[hadoop@master ~]$ ll /usr/local/src/hive/lib/tools.jar
-rw-r--r--. 1 hadoop hadoop 18290333 May 9 05:16 /usr/local/src/hive/lib/tools.jar
5.访问http://mater:9999/hwi/
实验一:查看大数据平台日志信息
1. 实验任务一:查看大数据平台主机日志
Linux 操作系统本身和大部分服务器程序的日志文件都默认放在目录/var/log/下。一 部分程序共用一个日志文件,一部分程序使用单个日志文件,而有些大型服务器程序由于日 志文件不止一个,所以会在/var/log/目录中建立相应的子目录来存放日志文件,这样既保 证了日志文件目录的结构清晰,又可以快速定位日志文件。有相当一部分日志文件只有 root 用户才有权限读取,这保证了相关日志信息的安全性。
使用 hadoop 用户登录 Linux 主机,切换到/var/log 目录,执行 ll 命令查询该目录所 有日志文件。
[hadoop@master ~]$ cd /var/log
[hadoop@master log]$ ll
total 3516
drwxr-xr-x. 2 root root 204 Mar 15 06:11 anaconda
drwx------. 2 root root 23 Mar 15 06:12 audit
-rw-------. 1 root root 0 May 9 05:15 boot.log
-rw-------. 1 root root 78764 Mar 22 09:09 boot.log-20230322
-rw-------. 1 root root 44295 Apr 12 05:06 boot.log-20230412
-rw-------. 1 root root 61033 May 9 05:15 boot.log-20230509
-rw-------. 1 root utmp 0 May 9 05:15 btmp
-rw-------. 1 root utmp 384 May 9 04:04 btmp-20230509
drwxr-xr-x. 2 chrony chrony 6 Apr 12 2018 chrony
-rw-------. 1 root root 298 May 9 05:15 cron
-rw-------. 1 root root 9807 Apr 12 05:06 cron-20230412
-rw-------. 1 root root 3109 May 9 05:15 cron-20230509
-rw-r--r--. 1 root root 123696 May 9 04:04 dmesg
-rw-r--r--. 1 root root 123578 May 5 23:32 dmesg.old
-rw-r--r--. 1 root root 0 Mar 15 06:12 firewalld
-rw-r--r--. 1 root root 193 Mar 15 06:08 grubby_prune_debug
-rw-r--r--. 1 root root 292292 May 9 05:13 lastlog
-rw-------. 1 root root 0 May 9 05:15 maillog
-rw-------. 1 root root 11704 Mar 22 11:02 maillog-20230412
-rw-------. 1 root root 0 Apr 12 05:06 maillog-20230509
-rw-------. 1 root root 234 May 9 05:15 messages
-rw-------. 1 root root 1912585 Apr 12 05:01 messages-20230412
-rw-------. 1 root root 934185 May 9 05:13 messages-20230509
-rw-r--r--. 1 mysql mysql 57275 May 9 04:04 mysqld.log
drwxr-xr-x. 2 root root 6 Mar 15 06:11 rhsm
-rw-------. 1 root root 0 May 9 05:15 secure
-rw-------. 1 root root 47284 Apr 12 04:48 secure-20230412
-rw-------. 1 root root 13842 May 9 05:13 secure-20230509
-rw-------. 1 root root 0 May 9 05:15 spooler
-rw-------. 1 root root 0 Mar 15 06:09 spooler-20230412
-rw-------. 1 root root 0 Apr 12 05:06 spooler-20230509
-rw-------. 1 root root 0 Mar 15 06:08 tallylog
drwxr-xr-x. 2 root root 23 Mar 15 06:12 tuned
-rw-r--r--. 1 root root 28261 May 9 04:04 vmware-vgauthsvc.log.0
-rw-r--r--. 1 root root 32585 May 9 04:03 vmware-vmsvc.log
-rw-rw-r--. 1 root utmp 51840 May 9 05:13 wtmp
-rw-------. 1 root root 1852 Mar 22 10:27 yum.log
结果显示,包含了以下多种功能的日志文件,下面逐一查看这些日志内容。
1.1. 步骤一:查看内核及公共消息日志(/var/log/messages)。
内核及公共信息日志是许多进程日志文件的汇总,可以切换到 root 用户,采用 cat 或 tail 命令查看该文件。
[hadoop@master log]$ exit
logout
[root@master hwi]# cd /var/log
[root@master log]#
[root@master log]# cat messages
May 9 05:15:01 master rsyslogd: [origin software="rsyslogd" swVersion="8.24.0" x-pid="916" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
May 9 05:15:19 master chronyd[659]: Source 193.182.111.143 replaced with 143.107.229.211
以上结果不仅包含了 master 主机用户切换的日志,而且包含了服务的状态,如防火墙 服务状态的记录:“Started firewalld - dynamic firewall daemon”和“Stopped firewalld - dynamic firewall daemon.”。
1.2. 步骤二:查看计划任务日志/var/log/cron。
该文件会记录 crontab 计划任务的创建、执行信息。执行 cat cron 命令,显示如下:
[root@master log]# cat cron
May 9 05:15:01 master run-parts(/etc/cron.daily)[1521]: finished logrotate
May 9 05:15:01 master run-parts(/etc/cron.daily)[1509]: starting man-db.cron
May 9 05:15:01 master run-parts(/etc/cron.daily)[1532]: finished man-db.cron
May 9 05:15:01 master anacron[1277]: Job `cron.daily' terminated
1.3. 步骤三:查看系统引导日志/var/log/dmesg。
该文件记录硬件设备信息(device)属纯文本,也可以用 dmesg 命令查看。由于文件内容 比较多,截取了部分的内容,显示如下:
[root@master log]# dmesg
.....
[ 5.195834] XFS (sda1): Mounting V5 Filesystem
[ 5.545742] XFS (sda1): Starting recovery (logdev: internal)
[ 5.548948] XFS (sda1): Ending recovery (logdev: internal)
[ 5.686390] type=1305 audit(1683619454.113:4): audit_pid=623 old=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1
[ 6.232387] NET: Registered protocol family 40
[ 6.726856] IPv6: ADDRCONF(NETDEV_UP): ens33: link is not ready
[ 6.729804] e1000: ens33 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 6.731762] IPv6: ADDRCONF(NETDEV_UP): ens33: link is not ready
[ 6.734428] IPv6: ADDRCONF(NETDEV_UP): ens33: link is not ready
[ 6.735223] IPv6: ADDRCONF(NETDEV_CHANGE): ens33: link becomes ready
[ 7.893332] floppy0: no floppy controllers found
[ 7.893364] work still pending
以上结果显示了网卡设备 e1000 启动,获取 IPv6 地址的过程。
1.4. 步骤四:查看邮件系统日志:/var/log/maillog。
该日志文件记录了每一个发送到系统或从系统发出的电子邮件的活动。它可以用来查看 用户使用哪个系统发送工具或把数据发送到哪个系统。可以采用 cat /var/log/maillog 或 者 tail -f /var/log/maillog 查看电子邮件的活动。
1.5. 步骤五:查看用户登录日志。
这种日志数据用于记录 Linux 操作系统用户登录及退出系统的相关信息,包括用户名、 登录的终端、登录时间、来源主机、正在使用的进程操作等。 以下文件保存了用户登录、退出系统等相关信息 1)/var/log/lastlog :最近的用户登录事件
2)/var/log/wtmp :用户登录注销及系统开、关机事件
3)/var/run/utmp :当前登录的每个用户的详细信息
4)/var/log/secure :与用户验证相关的安全性事件
(1)lastlog 列出所有用户最近登录的信息 lastlog 引用的是/var/log/lastlog 文件中的信息,包括登录名、端口、最后登录时 间等。
[root@master log]# lastlog
#用户名 端口 来自 最后登陆时间
Username Port From Latest
root pts/1 192.168.100.1 Tue May 9 05:13:21 -0400 2023
bin **Never logged in**
daemon **Never logged in**
adm **Never logged in**
lp **Never logged in**
sync **Never logged in**
shutdown **Never logged in**
halt **Never logged in**
mail **Never logged in**
operator **Never logged in**
games **Never logged in**
ftp **Never logged in**
nobody **Never logged in**
systemd-network **Never logged in**
dbus **Never logged in**
polkitd **Never logged in**
sshd **Never logged in**
postfix **Never logged in**
chrony **Never logged in**
hadoop pts/0 Tue May 9 05:11:30 -0400 2023
mysql **Never logged in**
(2)last 列出当前和曾经登入系统的用户信息 它默认读取的是/var/log/wtmp 文件的信息。输出的内容包括:用户名、终端位置、登 录源信息、开始时间、结束时间、持续时间。注意最后一行输出的是 wtmp 文件起始记录的 时间。当然也可以通过 last -f 参数指定读取文件,可以是/var/log/btmp、/var/run/utmp 文件。 执行命令 last,
[root@master log]# last
root pts/1 192.168.100.1 Tue May 9 05:13 still logged in
root pts/0 192.168.100.1 Tue May 9 04:21 still logged in
root tty1 Tue May 9 04:04 still logged in
reboot system boot 3.10.0-862.el7.x Tue May 9 04:04 - 05:36 (01:31)
reboot system boot 3.10.0-862.el7.x Fri May 5 23:32 - 05:36 (3+06:03)
root pts/0 192.168.100.1 Fri May 5 23:19 - crash (00:13)
reboot system boot 3.10.0-862.el7.x Fri May 5 23:18 - 05:36 (3+06:17)
root pts/0 192.168.100.1 Tue Apr 18 22:30 - crash (17+00:47)
reboot system boot 3.10.0-862.el7.x Tue Apr 18 22:30 - 05:36 (20+07:05)
root pts/1 192.168.100.1 Tue Apr 18 22:27 - crash (00:02)
root pts/0 192.168.100.1 Tue Apr 18 21:36 - crash (00:53)
root tty1 Tue Apr 18 21:33 - crash (00:56)
reboot system boot 3.10.0-862.el7.x Tue Apr 18 21:33 - 05:36 (20+08:02)
root pts/1 192.168.100.1 Wed Apr 12 08:24 - crash (6+13:08)
root pts/0 192.168.100.1 Wed Apr 12 08:00 - crash (6+13:33)
reboot system boot 3.10.0-862.el7.x Wed Apr 12 07:59 - 05:36 (26+21:36)
root pts/1 192.168.100.1 Wed Apr 12 06:19 - crash (01:40)
root pts/0 192.168.100.1 Wed Apr 12 05:26 - crash (02:33)
reboot system boot 3.10.0-862.el7.x Wed Apr 12 05:24 - 05:36 (27+00:12)
root pts/0 192.168.100.1 Wed Apr 12 04:45 - crash (00:38)
root tty1 Wed Apr 12 04:43 - crash (00:40)
reboot system boot 3.10.0-862.el7.x Wed Apr 12 04:42 - 05:36 (27+00:53)
root pts/0 192.168.100.1 Tue Apr 11 22:40 - crash (06:02)
root tty1 Tue Apr 11 22:38 - crash (06:03)
reboot system boot 3.10.0-862.el7.x Tue Apr 11 22:37 - 05:36 (27+06:58)
root pts/0 192.168.100.1 Thu Apr 6 05:15 - crash (5+17:21)
reboot system boot 3.10.0-862.el7.x Thu Apr 6 05:15 - 05:36 (33+00:20)
root pts/0 192.168.100.1 Tue Mar 28 21:21 - crash (8+07:54)
root tty1 Tue Mar 28 21:19 - crash (8+07:55)
reboot system boot 3.10.0-862.el7.x Tue Mar 28 21:19 - 05:36 (41+08:16)
root pts/1 192.168.100.1 Wed Mar 22 23:58 - crash (5+21:21)
root tty1 Wed Mar 22 22:53 - crash (5+22:25)
root pts/0 192.168.100.1 Wed Mar 22 22:53 - crash (5+22:25)
reboot system boot 3.10.0-862.el7.x Wed Mar 22 22:51 - 05:36 (47+06:44)
root pts/0 192.168.100.1 Wed Mar 22 09:36 - 11:01 (01:24)
root pts/0 192.168.100.1 Wed Mar 22 08:56 - 09:08 (00:12)
root tty1 Wed Mar 22 08:54 - crash (13:57)
reboot system boot 3.10.0-862.el7.x Wed Mar 22 08:47 - 05:36 (47+20:48)
root tty1 Tue Mar 21 21:57 - crash (10:50)
reboot system boot 3.10.0-862.el7.x Tue Mar 21 21:56 - 05:36 (48+07:39)
hadoop pts/2 master Fri Mar 17 01:41 - 01:41 (00:00)
hadoop pts/2 master Fri Mar 17 01:40 - 01:40 (00:00)
hadoop pts/1 master Fri Mar 17 01:38 - crash (4+20:18)
root pts/1 192.168.100.1 Fri Mar 17 01:35 - 01:35 (00:00)
root pts/1 192.168.100.1 Fri Mar 17 01:22 - 01:23 (00:01)
root pts/0 192.168.100.1 Fri Mar 17 01:13 - crash (4+20:43)
root pts/0 192.168.100.1 Fri Mar 17 00:59 - 01:12 (00:13)
reboot system boot 3.10.0-862.el7.x Fri Mar 17 00:58 - 05:36 (53+04:37)
root pts/0 192.168.100.1 Thu Mar 16 05:15 - crash (19:42)
reboot system boot 3.10.0-862.el7.x Thu Mar 16 05:15 - 05:36 (54+00:20)
root pts/0 192.168.100.1 Thu Mar 16 05:10 - down (00:04)
reboot system boot 3.10.0-862.el7.x Thu Mar 16 05:10 - 05:15 (00:04)
root pts/0 192.168.100.1 Thu Mar 16 03:34 - down (01:35)
root tty1 Thu Mar 16 03:33 - 05:09 (01:36)
reboot system boot 3.10.0-862.el7.x Thu Mar 16 03:33 - 05:09 (01:36)
root pts/0 192.168.100.1 Wed Mar 15 00:32 - crash (1+03:01)
reboot system boot 3.10.0-862.el7.x Wed Mar 15 00:32 - 05:09 (1+04:37)
root pts/0 192.168.100.1 Tue Mar 14 22:18 - crash (02:13)
reboot system boot 3.10.0-862.el7.x Tue Mar 14 22:18 - 05:09 (1+06:51)
root tty1 Wed Mar 15 06:12 - crash (-7:-54)
reboot system boot 3.10.0-862.el7.x Wed Mar 15 06:12 - 05:09 (22:57)
wtmp begins Wed Mar 15 06:12:14 2023
使用命令 last -f /var/run/utmp,查看 utmp 文件
[root@master log]# last -f /var/run/utmp
root pts/1 192.168.100.1 Tue May 9 05:13 still logged in
root pts/0 192.168.100.1 Tue May 9 04:21 still logged in
root tty1 Tue May 9 04:04 still logged in
reboot system boot 3.10.0-862.el7.x Tue May 9 04:04 - 05:36 (01:32)
utmp begins Tue May 9 04:04:12 2023
(3)lastb 列出失败尝试的登录信息 lastb 和 last 命令功能完全相同,只不过它默认读取的是/var/log/btmp 文件的信息。
[root@master log]# lastb
btmp begins Tue May 9 05:15:01 2023
上面结果显示,zhq 和 hadoop 用户曾经登录失败。
(4)通过 Linux 系统安全日志文件/var/log/secure 可查看 SSH 登录行为,该文件读 取需要 root 权限。
切换为 root 用户,执行 cat /var/log/secure 命令查看服务器登陆行为
[root@master log]# cat /var/log/secure
May 9 05:29:00 master su: pam_unix(su-l:session): session closed for user hadoop
2. 实验任务二: 在 Hadoop MapReduce Jobs 中查看日志 信息
首先,我们需要在 MapReduce 作业中记录信息,比如使用标准库,log4j 和用 System.out.println()或者 System.err.println()写入标准输出流。Hadoop 提供了一个查 看日志的用户界面。
Hadoop 中每一个 Mapper 和 Reducer 都有以下三种类型的日志:
(1)stdout-System.out.println()的输出定向到这个文件。
(2)stderr-System.err.println()的输出定向到这个文件。
(3)syslog-log4j 的日记输出定向到这个文件。在作业执行中出现和没有被处理的所 有异常的栈跟踪信息会在 syslog 中显示。
在浏览器地址栏中输入 http://master:19888/jobhistory,将显示关于作业的摘要信 息,
注意:需先启动 jobhistory 进程
启动: 在 hadoop 用户下执行
[hadoop@master ~]$ cd /usr/local/src/hadoop/sbin
[hadoop@master sbin]$ ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/src/hadoop/logs/mapred-hadoop-historyserver-master.out
[hadoop@master sbin]$
单击 Job ID 为 job_1597629049628_0012 链接,会出现以下界面,
显示有关作业的额外信息,包括它的执行状态、开始和停止次数,以及运行所 在的队列等基本信息。我们可以查看有多少 Mapper 和 Reducer 用于执行作业。注意 Map、 Shuffle、Sort、Reduce 每个阶段的平均执行时间。现在我们可以点击链接去查看 Mapper 和 第 15 章 告警和日志信息监控 13 Reducer 的信息信息。
点击 Maps 的“1”的链接查看详细的 Mapper 日志,
现在可以查看 Mapper 特定实例的日志。我们可以点击 Logs(日志)以查看如下图 15- 4 所示的信息。
从上图我们可以查看三种日志:
(1)stdout,标准输出;
(2)stderr,标准错误;
(3)syslog,系统日志.
如果不启用日志聚合,则会显示一条日志不可用的消息,
在 yarn-site.xml 中加入以下配置启动日志聚合
[hadoop@master sbin]$ cd /usr/local/src/hadoop/etc/hadoop
[hadoop@master hadoop]$ vi yarn-site.xml
#添加
<property>
<name>>yarn.loag-aggregation-enable</name>
<value>true</value>
</property>
3. 实验任务三:通过用户界面查看 Hadoop 日志
默认情况下,可以通过以下 URL 访问日志,http: //master:19888。 如前所述,日志是否聚集对用户都是透明的。如果日志是聚集的,Job History Manager 将会把日志从 HDFS 中取回。如果日志是非聚集的,将通过向单个节点的节点管理器发送请 求来获取日志。
作业运行的时候,能通过 Application Master Web 界面查看的日志将可通过节点管理 器 web 界面查看。 Application Master Web 界面反过来可以通过资源管理器 Web 界面左边 “RUNNING”的作业链接来访问。默认资源管理器 Web 界面可通过以下 URL 访问: http://master:8088
根据上图所示,当前未有运行中的作业,故显示正在运行的作业为空。
点击左边的菜单“FINISHED”显示已经完成运行的作业,
我们也可以通过 Hadoop 的 用 户 界 面 查 看 日 志 信 息 , 使 用 浏 览 器 访 问 http://master:50070,点击 Utilities-->Logs,
从上图,我们可以看到 Hadoop 中的日志文件列表,包括 NameNode(名称节点)、 SecondaryNamenode(二级名称节点)、HistoryServer(历史服务器)、NodeManager(节点 管理器)和 ResourceManager(资源管理器)等日志文件。点击相应的日志文件即可查看日 志的内容。如点击查看hadoop-hadoop-namenode-master.out.2日志文件,
4. 实验任务四:通过命令查看 Hadoop 日志
可以通过与命令行交互的的方式获取 Hadoop 的日志文件列表。
当某个日志达到一定的大小,将会被切割出一个新的文件,切割出来的日志文件名类似 “XXX.log.数字”的形式,后面的数字越大,代表日志越旧。在默认情况下,只保存前 20 个日志文件。 使用 hadoop 用户登录,并切换到/usr/local/src/hadoop/logs 目录,执行 ll 命令, 查看日志列表。
[hadoop@master ~]$ cd /usr/local/src/hadoop/logs
[hadoop@master logs]$ ll
total 3580
-rw-rw-r--. 1 hadoop hadoop 231622 Mar 22 10:03 hadoop-hadoop-datanode-master.log.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 716 Mar 16 05:11 hadoop-hadoop-datanode-master.out.1.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 0 Mar 15 00:33 hadoop-hadoop-datanode-master.out.2.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 716 Mar 15 00:32 hadoop-hadoop-datanode-master.out.3.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 716 Mar 14 23:32 hadoop-hadoop-datanode-master.out.4.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 716 Mar 22 09:40 hadoop-hadoop-datanode-master.out.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 2143513 May 9 06:08 hadoop-hadoop-namenode-master.log
-rw-rw-r--. 1 hadoop hadoop 716 May 9 05:48 hadoop-hadoop-namenode-master.out
-rw-rw-r--. 1 hadoop hadoop 716 May 5 23:20 hadoop-hadoop-namenode-master.out.1
-rw-rw-r--. 1 hadoop hadoop 716 Apr 18 22:31 hadoop-hadoop-namenode-master.out.2
-rw-rw-r--. 1 hadoop hadoop 716 Apr 18 22:28 hadoop-hadoop-namenode-master.out.3
-rw-rw-r--. 1 hadoop hadoop 716 Apr 12 08:01 hadoop-hadoop-namenode-master.out.4
-rw-rw-r--. 1 hadoop hadoop 716 Apr 12 05:28 hadoop-hadoop-namenode-master.out.5
-rw-rw-r--. 1 hadoop hadoop 516399 May 9 06:08 hadoop-hadoop-secondarynamenode-master.log
-rw-rw-r--. 1 hadoop hadoop 55880 May 9 06:08 hadoop-hadoop-secondarynamenode-master.out
-rw-rw-r--. 1 hadoop hadoop 716 May 5 23:20 hadoop-hadoop-secondarynamenode-master.out.1
-rw-rw-r--. 1 hadoop hadoop 716 Apr 18 22:31 hadoop-hadoop-secondarynamenode-master.out.2
-rw-rw-r--. 1 hadoop hadoop 0 Apr 18 22:28 hadoop-hadoop-secondarynamenode-master.out.3
-rw-rw-r--. 1 hadoop hadoop 716 Apr 12 08:02 hadoop-hadoop-secondarynamenode-master.out.4
-rw-rw-r--. 1 hadoop hadoop 716 Apr 12 05:28 hadoop-hadoop-secondarynamenode-master.out.5
-rw-r--r--. 1 hadoop hadoop 28619 Mar 15 00:34 hadoop-root-datanode-master.log.COMPLETED
-rw-r--r--. 1 hadoop hadoop 714 Mar 15 00:32 hadoop-root-datanode-master.out.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 61443 May 9 06:08 mapred-hadoop-historyserver-master.log
-rw-rw-r--. 1 hadoop hadoop 0 May 9 05:51 mapred-hadoop-historyserver-master.out
-rw-rw-r--. 1 hadoop hadoop 0 May 9 05:41 mapred-hadoop-historyserver-master.out.1
-rw-rw-r--. 1 hadoop hadoop 0 Apr 18 22:28 SecurityAuth-hadoop.audit
-rw-rw-r--. 1 hadoop hadoop 0 Mar 14 23:31 SecurityAuth-hadoop.audit.COMPLETED
-rw-r--r--. 1 hadoop hadoop 0 Mar 15 00:32 SecurityAuth-root.audit.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 497484 May 9 05:59 yarn-hadoop-resourcemanager-master.log
-rw-rw-r--. 1 hadoop hadoop 2078 May 9 05:56 yarn-hadoop-resourcemanager-master.out
-rw-rw-r--. 1 hadoop hadoop 700 May 5 23:20 yarn-hadoop-resourcemanager-master.out.1
-rw-rw-r--. 1 hadoop hadoop 700 Apr 18 22:31 yarn-hadoop-resourcemanager-master.out.2
-rw-rw-r--. 1 hadoop hadoop 700 Apr 12 08:02 yarn-hadoop-resourcemanager-master.out.3
-rw-rw-r--. 1 hadoop hadoop 700 Apr 12 05:28 yarn-hadoop-resourcemanager-master.out.4
-rw-rw-r--. 1 hadoop hadoop 700 Apr 12 04:48 yarn-hadoop-resourcemanager-master.out.5
-rw-rw-r--. 1 hadoop hadoop 2078 Mar 17 01:52 yarn-hadoop-resourcemanager-master.out.5.COMPLETED
我们可以获知日志文件的大小和 Hadoop 中所属组件的日记文件;yarn-hadoopresourcemanager-master.out 文件被切割为五个文件,并且后面的数字越大,代表该文件 越旧,符合 Hadoop 日志文件切割原则。
5. 实验任务五:查看 HBase 日志
Hbase提供了Web用户界面对日志文件的查看,使用浏览器访问http://master:60010, 显示 HBase 的 web 主界面,如图 15-11 所示。
点击“Local Logs”菜单打开 HBase 的日志列表,
点击其中一条链接来访问相应的日志信息,如hbase-hadoop-master-master.log
6. 实验任务六:查看 Hive 日志
Hive 日志存储的位置为/tmp/hadoop,在命令行的模式下,切换到该目录,执行 ll 命 令,查看 Hive 的日志列表,显示如下。
[root@master ~]# cd /tmp/hadoop
[root@master hadoop]# ll
total 200
-rw-rw-r--. 1 hadoop hadoop 200414 May 9 05:23 hive.log
-rw-rw-r--. 1 hadoop hadoop 2038 May 9 05:19 stderr
使用 cat 命令查看 hive.log 日志文件,显示如下:
[root@master hadoop]# cat hive.log
2023-05-09T05:13:20,615 INFO [main]: hwi.HWIServer (HWIServer.java:main(131)) - HWI is starting up
2023-05-09T05:13:23,275 INFO [main]: mortbay.log (Slf4jLog.java:info(67)) - Logging to org.apache.logging.slf4j.Log4jLogger@738dc9b via org.mortbay.log.Slf4jLog
2023-05-09T05:13:23,384 INFO [main]: mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26
2023-05-09T05:13:23,558 INFO [main]: mortbay.log (Slf4jLog.java:info(67)) - Extract /usr/local/src/hive/lib/hive-hwi-2.0.0.war to /tmp/Jetty_0_0_0_0_9999_hive.hwi.2.0.0.war__hwi__p3f5if/webapp
2023-05-09T05:13:25,120 INFO [main]: mortbay.log (Slf4jLog.java:info(67)) - Started [SocketConnector@0.0.0.0:9999, null]
2023-05-09T05:19:24,145 INFO [main]: hwi.HWIServer (HWIServer.java:main(131)) - HWI is starting up
2023-05-09T05:19:24,801 INFO [main]: mortbay.log (Slf4jLog.java:info(67)) - Logging to org.apache.logging.slf4j.Log4jLogger@4145bad8 via org.mortbay.log.Slf4jLog
2023-05-09T05:19:24,983 INFO [main]: mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26
2023-05-09T05:19:25,176 INFO [main]: mortbay.log (Slf4jLog.java:info(67)) - Extract /usr/local/src/hive/lib/hive-hwi-2.0.0.war to /tmp/Jetty_0_0_0_0_9999_hive.hwi.2.0.0.war__hwi__p3f5if/webapp
2023-05-09T05:19:26,310 INFO [main]: mortbay.log (Slf4jLog.java:info(67)) - Started [SocketConnector@0.0.0.0:9999, null]
2023-05-09T05:21:24,720 ERROR [867988177@qtp-1094523823-0]: mortbay.log (Slf4jLog.java:warn(87)) - /hwi/
org.apache.tools.ant.BuildException: The following error occurred while executing this line:
jar:file:/usr/local/src/hive/lib/ant-1.9.1.jar!/org/apache/tools/ant/antlib.xml:37: Could not create task or type of type: componentdef.
Ant could not find the task or a class this task relies upon.
This is common and has a number of causes; the usual
solutions are to read the manual pages then download and
install needed JAR files, or fix the build file:
- You have misspelt 'componentdef'.
Fix: check your spelling.
- The task needs an external JAR file to execute
and this is not found at the right place in the classpath.
Fix: check the documentation for dependencies.
Fix: declare the task.
- The task is an Ant optional task and the JAR file and/or libraries
implementing the functionality were not found at the time you
yourself built your installation of Ant from the Ant sources.
Fix: Look in the ANT_HOME/lib for the 'ant-' JAR corresponding to the
task and make sure it contains more than merely a META-INF/MANIFEST.MF.
If all it contains is the manifest, then rebuild Ant with the needed
libraries present in ${ant.home}/lib/optional/ , or alternatively,
download a pre-built release version from apache.org
- The build file was written for a later version of Ant
Fix: upgrade to at least the latest release version of Ant
- The task is not an Ant core or optional task
and needs to be declared using <taskdef>.
- You are attempting to use a task defined using
<presetdef> or <macrodef> but have spelt wrong or not
defined it at the point of use
Remember that for JAR files to be visible to Ant tasks implemented
in ANT_HOME/lib, the files must be in the same directory or on the
classpath
.....
2. 实验二 查看大数据平台告警信息
1. 实验任务一:查看大数据平台主机告警信息
主机是大数据平台重要的基础设施,包含硬件资源(CPU、内存、存储等)和操作系统 (Linux),而 Linux 操作系统管理着硬件资源,按需求调度 CPU、内存和存储等资源,通过 Linux 操作查看相关日志的告警信息,可以了解硬件资源的状态,从而帮助运维人员快速定 位问题,解决问题。
Linux 操作系统的的日志文件存储在/var/log 文件夹中。我们可以利用日志管理工具 journalctl 查看 Linux 操作系统主机上的告警信息。journalctl 是 centos7 上专有的日志 管理工具,该工具是从 message 这个文件里读取信息。
切换到/var/log 文件夹,执行命令 journalctl -p err..alert 查询系统错误告警信 息,显示如下:
[root@master hadoop]# cd /var/log
[root@master log]# journalctl -p err..alert
-- Logs begin at Tue 2023-05-09 04:04:10 EDT, end at Tue 2023-05-09 06:15:29 EDT. --
May 09 04:04:10 localhost.localdomain kernel: Detected CPU family 6 model 141 stepping 1
May 09 04:04:10 localhost.localdomain kernel: Warning: Intel Processor - this hardware has
May 09 04:04:10 localhost.localdomain kernel: sd 2:0:0:0: [sda] Assuming drive cache: writ
May 09 04:04:13 master kernel: piix4_smbus 0000:00:07.3: SMBus Host Controller not enabled
May 09 04:04:16 master systemd[1]: Failed to start Postfix Mail Transport Agent.
May 09 05:00:35 master sshd[1240]: pam_systemd(sshd:session): Failed to release
我们通过查看分析 Linux 操作系统主机的告警信息,就能有针对性的解决各 种服务的问题。
我们也可以使用 journalctl 命令,根据服务的 ID 号来查询其告警信息。如根据上面的 结果显示,我们得知 sshd 服务的 ID 为 13067,查询 SSHD 服务错误告警信息,执行命令 journalctl _PID=13067 -p err,结果显示如下。
[root@master log]# journalctl _PID=13067 -p err
-- No entries --
实验任务二:查看 Hadoop 告警信息
Hadoop 的日志主要是存在/usr/local/src/hadoop/logs 文件夹中,而日志文件包含 Hadoop 各组件的状态和告警信息。切换到/usr/local/src/hadoop/logs 目录,文件列表如 下:
[root@master log]# cd /usr/local/src/hadoop/logs
[root@master logs]# ll
total 3908
-rw-rw-r--. 1 hadoop hadoop 231622 Mar 22 10:03 hadoop-hadoop-datanode-master.log.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 716 Mar 16 05:11 hadoop-hadoop-datanode-master.out.1.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 0 Mar 15 00:33 hadoop-hadoop-datanode-master.out.2.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 716 Mar 15 00:32 hadoop-hadoop-datanode-master.out.3.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 716 Mar 14 23:32 hadoop-hadoop-datanode-master.out.4.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 716 Mar 22 09:40 hadoop-hadoop-datanode-master.out.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 2190848 May 9 06:19 hadoop-hadoop-namenode-master.log
-rw-rw-r--. 1 hadoop hadoop 716 May 9 05:48 hadoop-hadoop-namenode-master.out
-rw-rw-r--. 1 hadoop hadoop 716 May 5 23:20 hadoop-hadoop-namenode-master.out.1
-rw-rw-r--. 1 hadoop hadoop 716 Apr 18 22:31 hadoop-hadoop-namenode-master.out.2
-rw-rw-r--. 1 hadoop hadoop 716 Apr 18 22:28 hadoop-hadoop-namenode-master.out.3
-rw-rw-r--. 1 hadoop hadoop 716 Apr 12 08:01 hadoop-hadoop-namenode-master.out.4
-rw-rw-r--. 1 hadoop hadoop 716 Apr 12 05:28 hadoop-hadoop-namenode-master.out.5
-rw-rw-r--. 1 hadoop hadoop 549245 May 9 06:19 hadoop-hadoop-secondarynamenode-master.log
-rw-rw-r--. 1 hadoop hadoop 87472 May 9 06:19 hadoop-hadoop-secondarynamenode-master.out
-rw-rw-r--. 1 hadoop hadoop 716 May 5 23:20 hadoop-hadoop-secondarynamenode-master.out.1
-rw-rw-r--. 1 hadoop hadoop 716 Apr 18 22:31 hadoop-hadoop-secondarynamenode-master.out.2
-rw-rw-r--. 1 hadoop hadoop 0 Apr 18 22:28 hadoop-hadoop-secondarynamenode-master.out.3
-rw-rw-r--. 1 hadoop hadoop 716 Apr 12 08:02 hadoop-hadoop-secondarynamenode-master.out.4
-rw-rw-r--. 1 hadoop hadoop 716 Apr 12 05:28 hadoop-hadoop-secondarynamenode-master.out.5
-rw-r--r--. 1 hadoop hadoop 28619 Mar 15 00:34 hadoop-root-datanode-master.log.COMPLETED
-rw-r--r--. 1 hadoop hadoop 714 Mar 15 00:32 hadoop-root-datanode-master.out.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 73762 May 9 06:19 mapred-hadoop-historyserver-master.log
-rw-rw-r--. 1 hadoop hadoop 0 May 9 05:51 mapred-hadoop-historyserver-master.out
-rw-rw-r--. 1 hadoop hadoop 0 May 9 05:41 mapred-hadoop-historyserver-master.out.1
-rw-rw-r--. 1 hadoop hadoop 0 Apr 18 22:28 SecurityAuth-hadoop.audit
-rw-rw-r--. 1 hadoop hadoop 0 Mar 14 23:31 SecurityAuth-hadoop.audit.COMPLETED
-rw-r--r--. 1 hadoop hadoop 0 Mar 15 00:32 SecurityAuth-root.audit.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 497484 May 9 05:59 yarn-hadoop-resourcemanager-master.log
-rw-rw-r--. 1 hadoop hadoop 2078 May 9 05:56 yarn-hadoop-resourcemanager-master.out
-rw-rw-r--. 1 hadoop hadoop 700 May 5 23:20 yarn-hadoop-resourcemanager-master.out.1
-rw-rw-r--. 1 hadoop hadoop 700 Apr 18 22:31 yarn-hadoop-resourcemanager-master.out.2
-rw-rw-r--. 1 hadoop hadoop 700 Apr 12 08:02 yarn-hadoop-resourcemanager-master.out.3
-rw-rw-r--. 1 hadoop hadoop 700 Apr 12 05:28 yarn-hadoop-resourcemanager-master.out.4
-rw-rw-r--. 1 hadoop hadoop 700 Apr 12 04:48 yarn-hadoop-resourcemanager-master.out.5
-rw-rw-r--. 1 hadoop hadoop 2078 Mar 17 01:52 yarn-hadoop-resourcemanager-master.out.5.COMPLETED
我们通过查看某个日志文件中包含告警信息的行,然后将这些行显示出来,如查询 ResourceManager 日记最新 1000 行且包含“info”关键字的告警信息,执行命令 tail - 1000f yarn-hadoop-resourcemanager-master.log | grep info,结果显示如下。
[root@master logs]# cd /usr/local/src/hadoop/logs
[root@master logs]# tail -1000f yarn-hadoop-resourcemanager-master.log | grep info
2023-04-12 05:36:59,076 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1681291712687_0001
2023-04-12 05:37:17,137 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1681291712687_0001
2023-04-12 06:31:28,677 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1681291712687_0002
2023-04-12 06:32:11,797 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1681291712687_0002
2023-04-12 08:12:40,355 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1681300935029_0001
2023-04-12 08:12:49,968 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1681300935029_0001
2023-04-12 08:24:48,988 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1681300935029_0002
2023-04-12 08:24:58,572 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1681300935029_0002
实验任务三:查看 HBase 告警信息
-
步骤一:变更日志告警级别
在 HBase 的 Web 用户界面提供了日志告警级别的查询和设置功能。在浏览器中访问 http://master:60010/logLevel 页面
若要查询某个日志的告警级别,输入该日志名,点击“Get Log Level”按钮,显示该 日志的告警级别。如查询日志文件 habase-hadoop-master-master.log 的告警级别,
结果显示,日志文件 habase-hadoop-master-master.log 的告警级别为 INFO。如果要 将该日志告警级别调整为 WARN,则在第二个框中输入 Log:habase-hadoop-mastermaster.log,Level:WARN,点击“Set Log Level”按钮,
结果显示,habase-hadoop-master-master.log 日志告警级别已变更为 WARN 级别。
2. 步骤二:查询日志告警信息
HBase 的日志文件存储在/usr/loacl/src/hbase/logs 目录中,切换到该目录下,查看 第 15 章 告警 hbase-hadoop-master-master.log 文件的“INFO”告警信息, 执行命令 tail -100f hbasehadoop-master-master.log |grep INFO,结果显示如下。
[root@master logs]# cd /usr/local/src/hadoop/logs
[root@master logs]# tail -100f yarn-hadoop-resourcemanager-master.log | grep info
2023-04-12 05:36:59,076 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1681291712687_0001
2023-04-12 05:37:17,137 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1681291712687_0001
2023-04-12 06:31:28,677 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1681291712687_0002
2023-04-12 06:32:11,797 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1681291712687_0002
2023-04-12 08:12:40,355 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1681300935029_0001
2023-04-12 08:12:49,968 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1681300935029_0001
2023-04-12 08:24:48,988 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1681300935029_0002
2023-04-12 08:24:58,572 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1681300935029_0002
查看 hbase-hadoop-master-master.log 文件的“WARN”级别告警信息,执行命令 tail -100f hbase-hadoop-master-master.log |grep WARN,结果显示如下
[root@master logs]# tail -100f hbase-hadoop-master-master.log |grep WARN
2023-04-06 05:29:33,995 WARN [master:16000.activeMasterManager] wal.WALProcedureStore: Log directory not found: File hdfs://master:9000/hbase/MasterProcWALs does not exist.
2023-05-05 23:21:59,615 WARN [master/master/192.168.100.10:16000-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:16,378 WARN [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:16,481 WARN [main-SendThread(master:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:16,623 WARN [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:17,727 WARN [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:17,829 WARN [main-SendThread(master:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:17,971 WARN [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:19,077 WARN [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:19,178 WARN [main-SendThread(master:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:19,292 WARN [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:20,394 WARN [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:20,496 WARN [main-SendThread(master:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:20,599 WARN [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:21,701 WARN [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:21,802 WARN [main-SendThread(master:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:21,904 WARN [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:23,007 WARN [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:23,108 WARN [main-SendThread(master:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:23,219 WARN [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:24,323 WARN [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:24,557 WARN [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:25,928 WARN [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:26,314 WARN [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:27,729 WARN [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:28,775 WARN [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:30,773 WARN [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:32,060 WARN [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
4. 实验任务四:查看 Hive 告警信息
Hive 的日志文件存储在/tmp/hadoop 目录下,切换到该目录,并执行命令 ll,显示如下。
[root@master hadoop]# cd /tmp/hadoop
[root@master hadoop]# tail -1000f hive.log |grep INFO
2023-05-09T11:05:27,968 INFO
[org.apache.hadoop.hive.common.JvmPauseMonitor$Monitor@37d3e140]:
common.JvmPauseMonitor (JvmPauseMonitor.java:run(194)) - Detected
pause in JVM or host machine (eg GC): pause of approximately 4923ms
2023-05-09T15:25:31,520 INFO
[org.apache.hadoop.hive.common.JvmPauseMonitor$Monitor@37d3e140]:
common.JvmPauseMonitor (JvmPauseMonitor.java:run(194)) - Detected
pause in JVM or host machine (eg GC): pause of approximately 4439ms
Stderr(标准错误),该标准 IO 流通过预定义文件指针 stderr 加以引用,且该流引用 的文件与文件描述符 STDERR_FILENO 所引用的相同。
[root@master hadoop]# tail -1000f stderr |grep ERROR
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· Vue3状态管理终极指南:Pinia保姆级教程