大数据平台告警信息监控
大数据平台监控界面和报表
通过界面查看大数据平台状态
未配主机映射网址换成ip访问
地址:http://master:8088/cluster/nodes
通过界面查看Hadoop状态
Hadoop 的运行状态:
菜单功能:
1)Overview(总览),查看 Hadoop 启动时间、版本号、命名节点日志状态、命名节
点存储状态等信息;
2)Datanodes(数据节点),查看正在运行、停止运行的数据节点信息;
3)DataNode Volume Failures(数据节点挂载失败),查看挂载失败的数据节点;
4)Snapshot(快照),查看快照建立、删除的信息;
5)Startup Progress(启动进程),查看启动进程信息;
6)Browse The File System(文件系统浏览),查看 HDFS 中的文件和文件夹;
7)Logs(日记),查看 Hadoop 的命名节点、资源管理等日志
Hadoop 的详细汇总信息:
Web界面监控大数据平台资源状态
通过界面监控YARN的状态
Hadoop 中查看 MapReduce 运行日志
查看 mapreduce 日志需要先启动 jobhistory 进程
启动: 在 hadoop 用户下执行
[root@master ~]# cd /usr/local/src/hadoop/sbin
[root@master sbin]# ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/src/hadoop/logs/mapred-root-historyserver-master.out
由于当前没有任务进程,web界面为空,于是试一下统计单词,运行 WordCount 案例,计算数据文件中各单词的频度:
[hadoop@master ~]$ cd /usr/local/src/hadoop/
[hadoop@master hadoop]$ ls
bin dfs etc include lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share tmp
[hadoop@master hadoop]$ hdfs dfs -put ~/data.txt /input
put: `/input/data.txt': File exists #显示存在,没有的的话就加
[hadoop@master hadoop]$ hdfs dfs -ls /input
Found 1 items
-rw-r--r-- 3 hadoop supergroup 46 2023-04-19 10:49 /input/data.txt
#如果/下有outputmulu,先删掉
[hadoop@master hadoop]$ hdfs dfs -rm -r -f /output
23/05/08 17:41:04 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /output
#执行计数
[hadoop@master hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input/data.txt /output
23/05/08 18:12:12 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.88.10:8032
23/05/08 18:12:13 INFO input.FileInputFormat: Total input paths to process : 1
23/05/08 18:12:13 INFO mapreduce.JobSubmitter: number of splits:1
23/05/08 18:12:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1683539521530_0005
23/05/08 18:12:13 INFO impl.YarnClientImpl: Submitted application application_1683539521530_0005
23/05/08 18:12:13 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1683539521530_0005/
23/05/08 18:12:13 INFO mapreduce.Job: Running job: job_1683539521530_0005
23/05/08 18:12:20 INFO mapreduce.Job: Job job_1683539521530_0005 running in uber mode : false
23/05/08 18:12:20 INFO mapreduce.Job: map 0% reduce 0%
23/05/08 18:12:26 INFO mapreduce.Job: map 100% reduce 0%
23/05/08 18:12:32 INFO mapreduce.Job: map 100% reduce 100%
23/05/08 18:12:32 INFO mapreduce.Job: Job job_1683539521530_0005 completed successfully
23/05/08 18:12:33 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=63
FILE: Number of bytes written=231009
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=144
HDFS: Number of bytes written=41
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
........
浏览器查看任务
通过界面监控HDFS状态
HDFS 文件夹打开和文件下载操作如下
打开文件夹
在地址栏中输入 hbase 名称,点击“Go!”按钮或者直接回车,打开相应的文件夹:
下载 HDFS 中的文件
通过界面监控HBase的状态
访问Web 用户界面地址分别为:master:60010 ;slave1:60010 ;slave2:60010
在访问之前,先要启动HbBse,而启动HBase就要先启动zookeeper,否则HBase会起不来
#先启动zookeeper
[hadoop@master ~]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
#启动HBase
[hadoop@master ~]$ start-hbase.sh
starting master, logging to /usr/local/src/hbase/logs/hbase-hadoop-master-master.out
slave2: starting regionserver, logging to /usr/local/src/hbase/logs/hbase-hadoop-regionserver-slave2.out
slave1: starting regionserver, logging to /usr/local/src/hbase/logs/hbase-hadoop-regionserver-slave1.out
[hadoop@master ~]$ jps
2437 NameNode
2789 ResourceManager
4518 HMaster
2631 SecondaryNameNode
4639 Jps
HBase 用户界面主页
HBase 的主页访问地址为 http://master:60010
HBase 的用户主界面菜单有 Table Details(表信息)、Local Logs(本地日志)、Log
Level(日记等级)、Debug Dump(调试转储)、Metrics Dump(指标转储)、HBase Configuration(HBase 配置)
查看 HBase 里面的表信息,点击的菜单栏 Table Details 可查看所有表信息
在 Tables 中点击 System Tables 查看系统表,主要是元数据、命名空间
大数据平台告警和日志信息监控
查看大数据平台主机日志
[hadoop@master ~]$ cd /var/log/
[hadoop@master log]$ ll
total 2212
drwxr-xr-x. 2 root root 204 Mar 1 09:48 anaconda
drwx------. 2 root root 23 Mar 1 09:49 audit
-rw-------. 1 root root 8943 May 8 17:48 boot.log
-rw------- 1 root root 70827 Mar 15 17:13 boot.log-20230315
-rw------- 1 root root 17941 Mar 25 18:21 boot.log-20230325
-rw------- 1 root root 8553 Mar 29 10:36 boot.log-20230329
-rw------- 1 root root 17758 Apr 12 14:06 boot.log-20230412
-rw------- 1 root root 8699 Apr 19 10:24 boot.log-20230419
-rw------- 1 root root 25902 May 6 10:34 boot.log-20230506
-rw------- 1 root root 8901 May 8 17:33 boot.log-20230508
-rw------- 1 root utmp 0 May 6 10:34 btmp
-rw------- 1 root utmp 0 Apr 12 14:06 btmp-20230506
-rw------- 1 root root 2479 May 8 20:01 cron
-rw------- 1 root root 1588 Mar 25 18:21 cron-20230325
-rw------- 1 root root 3170 Apr 12 14:06 cron-20230412
-rw------- 1 root root 1464 Apr 19 10:24 cron-20230419
-rw------- 1 root root 1919 May 6 10:34 cron-20230506
-rw-r--r-- 1 root root 122829 May 8 17:48 dmesg
-rw-r--r-- 1 root root 122808 May 8 16:40 dmesg.old
-rw-r--r--. 1 root root 0 Mar 1 09:49 firewalld
-rw-r--r--. 1 root root 193 Mar 1 09:45 grubby_prune_debug
-rw-r--r--. 1 root root 292292 May 8 20:30 lastlog
-rw------- 1 root root 0 May 6 10:34 maillog
-rw------- 1 root root 190 Mar 25 17:17 maillog-20230325
-rw------- 1 root root 0 Mar 25 18:21 maillog-20230412
-rw------- 1 root root 0 Apr 12 14:06 maillog-20230419
-rw------- 1 root root 0 Apr 19 10:24 maillog-20230506
-rw------- 1 root root 272392 May 8 20:30 messages
-rw------- 1 root root 268597 Mar 25 18:16 messages-20230325
-rw------- 1 root root 402646 Apr 12 14:01 messages-20230412
-rw------- 1 root root 134516 Apr 19 10:24 messages-20230419
-rw------- 1 root root 393753 May 6 10:20 messages-20230506
-rw-r--r-- 1 mysql mysql 89848 May 8 19:40 mysqld.log
drwxr-xr-x. 2 root root 6 Mar 1 09:48 rhsm
-rw------- 1 root root 11316 May 8 20:30 secure
-rw------- 1 root root 4476 Mar 25 18:16 secure-20230325
-rw------- 1 root root 15924 Apr 12 13:54 secure-20230412
-rw------- 1 root root 5308 Apr 19 10:24 secure-20230419
-rw------- 1 root root 5328 May 6 10:18 secure-20230506
-rw------- 1 root root 0 May 6 10:34 spooler
-rw------- 1 root root 0 Mar 15 17:13 spooler-20230325
-rw------- 1 root root 0 Mar 25 18:21 spooler-20230412
-rw------- 1 root root 0 Apr 12 14:06 spooler-20230419
-rw------- 1 root root 0 Apr 19 10:24 spooler-20230506
-rw-------. 1 root root 0 Mar 1 09:45 tallylog
drwxr-xr-x. 2 root root 23 Mar 1 09:49 tuned
-rw-r--r--. 1 root root 29439 May 8 17:48 vmware-vgauthsvc.log.0
-rw-r--r--. 1 root root 45878 May 8 19:40 vmware-vmsvc.log
-rw-rw-r--. 1 root utmp 57600 May 8 20:25 wtmp
-rw-------. 1 root root 2332 May 6 10:59 yum.log
查看内核及公共消息日志(/var/log/messages)
内核及公共信息日志是许多进程日志文件的汇总,可以切换到 root 用户,采用 cat 或
tail 命令查看该文件,这里采用head的方式
[root@master ~]# head -n 5 /var/log/messages
May 6 10:34:01 master rsyslogd: [origin software="rsyslogd" swVersion="8.24.0" x-pid="882" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
May 6 10:35:36 master su: (to hadoop) root on pts/0
May 6 10:51:05 master su: (to root) root on pts/0
May 6 10:59:17 master yum[2309]: Installed: 2:vim-filesystem-7.4.629-8.el7_9.x86_64
May 6 10:59:18 master yum[2309]: Installed: 2:vim-common-7.4.629-8.el7_9.x86_64
查看计划任务日志/var/log/cron
该文件会记录 crontab 计划任务的创建、执行信息
[root@master ~]# head -n 5 /var/log/cron
May 6 10:34:01 master run-parts(/etc/cron.daily)[2092]: finished logrotate
May 6 10:34:01 master run-parts(/etc/cron.daily)[2080]: starting man-db.cron
May 6 10:34:03 master run-parts(/etc/cron.daily)[2199]: finished man-db.cron
May 6 10:34:03 master anacron[1148]: Job `cron.daily' terminated
May 6 10:54:01 master anacron[1148]: Job `cron.weekly' started
查看系统引导日志/var/log/dmesg
该文件记录硬件设备信息(device)属纯文本,也可以用 dmesg 命令查看。由于文件内容
比较多,用head命令查看前5行
[root@master ~]# head -n 5 /var/log/dmesg
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Initializing cgroup subsys cpuacct
[ 0.000000] Linux version 3.10.0-862.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Fri Apr 20 16:44:24 UTC 2018
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.el7.x86_64 root=/dev/mapper/centos-root ro rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8
查看邮件系统日志/var/log/maillog
该日志文件记录了每一个发送到系统或从系统发出的电子邮件的活动。它可以用来查看
用户使用哪个系统发送工具或把数据发送到哪个系统
[root@master ~]# head -n 5 /var/log/maillog
查看用户登录日志
这种日志数据用于记录 Linux 操作系统用户登录及退出系统的相关信息,包括用户名、
登录的终端、登录时间、来源主机、正在使用的进程操作等
以下文件保存了用户登录、退出系统等相关信息:
1)/var/log/lastlog :最近的用户登录事件
2)/var/log/wtmp :用户登录注销及系统开、关机事件
3)/var/run/utmp :当前登录的每个用户的详细信息
4)/var/log/secure :与用户验证相关的安全性事件
lastlog 列出所有用户最近登录的信息
lastlog 引用的是/var/log/lastlog 文件中的信息,包括登录名、端口、最后登录时
间等
[root@master ~]# cd /var/log/
[root@master log]# lastlog
Username Port From Latest
root pts/0 192.168.88.1 Tue May 9 08:36:07 +0800 2023
bin **Never logged in**
daemon **Never logged in**
adm **Never logged in**
lp **Never logged in**
sync **Never logged in**
shutdown **Never logged in**
halt **Never logged in**
mail **Never logged in**
operator **Never logged in**
games **Never logged in**
ftp **Never logged in**
nobody **Never logged in**
systemd-network **Never logged in**
dbus **Never logged in**
polkitd **Never logged in**
sshd **Never logged in**
postfix **Never logged in**
hadoop pts/1 Mon May 8 20:30:43 +0800 2023
mysql **Never logged in**
ntp **Never logged in**
last 列出当前和曾经登入系统的用户信息
它默认读取的是/var/log/wtmp 文件的信息。输出的内容包括:用户名、终端位置、登
录源信息、开始时间、结束时间、持续时间。注意最后一行输出的是 wtmp 文件起始记录的
时间。当然也可以通过 last -f 参数指定读取文件,可以是/var/log/btmp、/var/run/utmp
文件
[root@master log]# last
root pts/0 192.168.88.1 Tue May 9 08:36 still logged in
reboot system boot 3.10.0-862.el7.x Tue May 9 08:33 - 08:43 (00:09)
root pts/1 192.168.88.1 Mon May 8 20:25 - 20:41 (00:15)
root pts/1 192.168.88.1 Mon May 8 19:42 - 20:25 (00:42)
root pts/0 192.168.88.1 Mon May 8 17:49 - crash (14:44)
reboot system boot 3.10.0-862.el7.x Mon May 8 17:48 - 08:43 (14:55)
root pts/1 192.168.88.1 Mon May 8 17:04 - down (00:43)
root pts/0 192.168.88.1 Mon May 8 16:48 - 17:47 (00:59)
reboot system boot 3.10.0-862.el7.x Mon May 8 16:39 - 17:48 (01:08)
root pts/1 192.168.88.1 Sat May 6 11:31 - 11:34 (00:02)
root pts/0 192.168.88.1 Sat May 6 11:29 - 11:34 (00:04)
root pts/0 192.168.88.1 Sat May 6 10:00 - 11:29 (01:29)
reboot system boot 3.10.0-862.el7.x Sat May 6 09:59 - 11:34 (01:35)
root pts/0 192.168.88.1 Tue May 2 13:18 - crash (3+20:41)
reboot system boot 3.10.0-862.el7.x Tue May 2 13:17 - 11:34 (3+22:16)
root tty1 Wed Apr 19 11:04 - 11:04 (00:00)
reboot system boot 3.10.0-862.el7.x Wed Apr 19 11:03 - 11:04 (00:00)
root pts/0 192.168.88.1 Wed Apr 19 09:47 - 11:01 (01:13)
reboot system boot 3.10.0-862.el7.x Wed Apr 19 09:44 - 11:02 (01:17)
root pts/0 192.168.88.1 Wed Apr 12 13:54 - 14:53 (00:58)
root pts/0 192.168.88.1 Wed Apr 12 13:16 - 13:54 (00:38)
reboot system boot 3.10.0-862.el7.x Wed Apr 12 13:15 - 11:02 (6+21:47)
root pts/0 192.168.88.1 Thu Apr 6 22:28 - 23:02 (00:34)
reboot system boot 3.10.0-862.el7.x Thu Apr 6 22:16 - 23:03 (00:46)
root pts/0 192.168.88.1 Wed Mar 29 09:24 - 10:43 (01:19)
reboot system boot 3.10.0-862.el7.x Wed Mar 29 09:22 - 23:03 (8+13:40)
root pts/0 192.168.88.1 Sat Mar 25 18:51 - 18:53 (00:02)
root pts/0 192.168.88.1 Sat Mar 25 17:27 - 18:51 (01:23)
reboot system boot 3.10.0-862.el7.x Sat Mar 25 17:26 - 23:03 (12+05:36)
root pts/0 192.168.88.1 Sat Mar 25 17:17 - 17:25 (00:07)
reboot system boot 3.10.0-862.el7.x Sat Mar 25 17:17 - 23:03 (12+05:45)
hadoop pts/1 master Wed Mar 15 16:08 - 16:08 (00:00)
root pts/0 192.168.88.1 Wed Mar 15 16:05 - 17:35 (01:29)
reboot system boot 3.10.0-862.el7.x Wed Mar 15 16:04 - 17:36 (01:31)
root pts/0 192.168.88.1 Wed Mar 15 15:56 - down (00:06)
reboot system boot 3.10.0-862.el7.x Wed Mar 15 15:55 - 16:03 (00:07)
root pts/0 192.168.88.1 Wed Mar 15 10:50 - 10:57 (00:06)
reboot system boot 3.10.0-862.el7.x Wed Mar 15 10:49 - 10:57 (00:07)
root pts/0 192.168.88.1 Sun Mar 5 10:26 - 11:52 (01:25)
reboot system boot 3.10.0-862.el7.x Sun Mar 5 10:25 - 10:57 (10+00:32)
root pts/0 192.168.88.1 Sun Mar 5 09:24 - crash (01:00)
reboot system boot 3.10.0-862.el7.x Sun Mar 5 09:22 - 10:57 (10+01:34)
root pts/0 192.168.88.1 Thu Mar 2 17:30 - 17:34 (00:04)
reboot system boot 3.10.0-862.el7.x Thu Mar 2 17:25 - 17:34 (00:09)
root pts/0 192.168.88.1 Wed Mar 1 09:59 - crash (1+07:25)
reboot system boot 3.10.0-862.el7.x Wed Mar 1 09:57 - 17:34 (1+07:36)
root pts/0 192.168.88.1 Wed Mar 1 09:56 - down (00:00)
root tty1 Wed Mar 1 09:52 - 09:57 (00:04)
reboot system boot 3.10.0-862.el7.x Wed Mar 1 09:49 - 09:57 (00:07)
wtmp begins Wed Mar 1 09:49:55 2023
使用命令 last -f /var/run/utmp,查看 utmp 文件
[root@master log]# last -f /var/run/utmp
root pts/0 192.168.88.1 Tue May 9 08:36 still logged in
reboot system boot 3.10.0-862.el7.x Tue May 9 08:33 - 08:45 (00:11)
utmp begins Tue May 9 08:33:54 2023
lastb 列出失败尝试的登录信息
lastb 和 last 命令功能完全相同,只不过它默认读取的是/var/log/btmp 文件的信息
[root@master log]# lastb
btmp begins Sat May 6 10:34:01 2023
通过 Linux 系统安全日志文件/var/log/secure 可查看 SSH 登录行为,该文件读取需要 root 权限
[root@master log]# cat /var/log/secure
May 6 10:35:36 master su: pam_unix(su-l:session): session opened for user hadoop by root(uid=0)
May 6 10:51:05 master su: pam_unix(su-l:session): session opened for user root by root(uid=1000)
May 6 11:02:08 master su: pam_unix(su-l:session): session opened for user hadoop by root(uid=0)
May 6 11:29:37 master su: pam_unix(su-l:session): session closed for user hadoop
May 6 11:29:37 master sshd[1112]: pam_unix(sshd:session): session closed for user root
May 6 11:29:37 master su: pam_unix(su-l:session): session closed for user root
May 6 11:29:37 master su: pam_unix(su-l:session): session closed for user hadoop
May 6 11:29:37 master su: pam_unix(su-l:session): session closed for user root
May 6 11:29:37 master su: pam_unix(su-l:session): session closed for user hadoop
May 6 11:29:42 master sshd[2462]: Accepted password for root from 192.168.88.1 port 3913 ssh2
May 6 11:29:42 master sshd[2462]: pam_unix(sshd:session): session opened for user root by (uid=0)
..................
在Hadoop MapReduce Jobs中查看日志信息
Hadoop 中每一个 Mapper 和 Reducer 都有以下三种类型的日志:
1)stdout-System.out.println()的输出定向到这个文件
2)stderr-System.err.println()的输出定向到这个文件
3)syslog-log4j 的日记输出定向到这个文件。在作业执行中出现和没有被处理的所
有异常的栈跟踪信息会在 syslog 中显示
在浏览器地址栏中输入 http://master:19888/jobhistory,将显示关于作业的摘要信
息,但请注意,需先启动 jobhistory 进程:
[hadoop@master ~]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.out
192.168.88.30: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
192.168.88.20: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.out
192.168.88.20: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
192.168.88.30: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
[hadoop@master ~]$ jps
1312 NameNode
1509 SecondaryNameNode
1670 ResourceManager
1931 Jps
[hadoop@master ~]$ cd /usr/local/src/hadoop/sbin/
[hadoop@master sbin]$ ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/src/hadoop/logs/mapred-hadoop-historyserver-master.out
单击 Job ID
点击 Maps 的“1”的链接查看详细的 Mapper 日志:
现在可以查看 Mapper 特定实例的日志。我们可以点击 Logs(日志)查看:
上图显示问题,是因为没有实现日志聚合,在 yarn-site.xml 中加入以下配置启动日志聚合:
[hadoop@master ~]$ cd /usr/local/src/hadoop/etc/hadoop
[hadoop@master hadoop]$ vi yarn-site.xml
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
通过用户界面查看Hadoop日志
点击左边的菜单“FINISHED”显示已经完成运行的作业
通过 Hadoop 的 用 户 界 面 查 看 日 志 信 息 , 使 用 浏 览 器 访 问
http://master:50070,点击 Utilities-->Logs
通过命令查看Hadoop 日志
[hadoop@master ~]$ cd /usr/local/src/hadoop/logs
[hadoop@master logs]$ ll
total 3844
-rw-rw-r-- 1 hadoop hadoop 1556048 May 9 10:22 hadoop-hadoop-namenode-master.log
-rw-rw-r-- 1 hadoop hadoop 716 May 9 10:16 hadoop-hadoop-namenode-master.out
-rw-rw-r-- 1 hadoop hadoop 716 May 9 09:56 hadoop-hadoop-namenode-master.out.1
-rw-rw-r-- 1 hadoop hadoop 716 Apr 12 14:51 hadoop-hadoop-namenode-master.out.1.COMPLETED
-rw-rw-r-- 1 hadoop hadoop 716 May 9 09:51 hadoop-hadoop-namenode-master.out.2
-rw-rw-r-- 1 hadoop hadoop 716 Apr 12 13:25 hadoop-hadoop-namenode-master.out.2.COMPLETED
-rw-rw-r-- 1 hadoop hadoop 716 May 9 09:39 hadoop-hadoop-namenode-master.out.3
-rw-rw-r-- 1 hadoop hadoop 716 Apr 6 22:41 hadoop-hadoop-namenode-master.out.3.COMPLETED
-rw-rw-r-- 1 hadoop hadoop 716 May 9 09:22 hadoop-hadoop-namenode-master.out.4
-rw-rw-r-- 1 hadoop hadoop 716 Mar 25 18:41 hadoop-hadoop-namenode-master.out.4.COMPLETED
-rw-rw-r-- 1 hadoop hadoop 716 May 9 09:16 hadoop-hadoop-namenode-master.out.5
-rw-rw-r-- 1 hadoop hadoop 716 Mar 25 18:30 hadoop-hadoop-namenode-master.out.5.COMPLETED
-rw-rw-r-- 1 hadoop hadoop 4965 Apr 19 10:56 hadoop-hadoop-namenode-master.out.COMPLETED
-rw-rw-r-- 1 hadoop hadoop 421330 May 9 10:17 hadoop-hadoop-secondarynamenode-master.log
-rw-rw-r-- 1 hadoop hadoop 191397 Apr 19 11:02 hadoop-hadoop-secondarynamenode-master.log.COMPLETED
-rw-rw-r-- 1 hadoop hadoop 716 May 9 10:16 hadoop-hadoop-secondarynamenode-master.out
-rw-rw-r-- 1 hadoop hadoop 716 May 9 09:56 hadoop-hadoop-secondarynamenode-master.out.1
-rw-rw-r-- 1 hadoop hadoop 716 Apr 12 14:51 hadoop-hadoop-secondarynamenode-master.out.1.COMPLETED
-rw-rw-r-- 1 hadoop hadoop 716 May 9 09:51 hadoop-hadoop-secondarynamenode-master.out.2
-rw-rw-r-- 1 hadoop hadoop 716 Apr 12 13:25 hadoop-hadoop-secondarynamenode-master.out.2.COMPLETED
-rw-rw-r-- 1 hadoop hadoop 716 May 9 09:39 hadoop-hadoop-secondarynamenode-master.out.3
-rw-rw-r-- 1 hadoop hadoop 716 Apr 6 22:42 hadoop-hadoop-secondarynamenode-master.out.3.COMPLETED
-rw-rw-r-- 1 hadoop hadoop 716 May 9 09:22 hadoop-hadoop-secondarynamenode-master.out.4
-rw-rw-r-- 1 hadoop hadoop 716 Mar 25 18:42 hadoop-hadoop-secondarynamenode-master.out.4.COMPLETED
-rw-rw-r-- 1 hadoop hadoop 716 May 9 09:16 hadoop-hadoop-secondarynamenode-master.out.5
-rw-rw-r-- 1 hadoop hadoop 716 Mar 25 18:30 hadoop-hadoop-secondarynamenode-master.out.5.COMPLETED
-rw-rw-r-- 1 hadoop hadoop 716 Apr 19 10:24 hadoop-hadoop-secondarynamenode-master.out.COMPLETED
...............
查看HBase日志
Hbase提供了Web用户界面对日志文件的查看,使用浏览器访问http://master:60010,
显示 HBase 的 web 主界面
点击“Local Logs”菜单打开 HBase 的日志列表
查看Hive日志
Hive 日志存储的位置为/tmp/hadoop,在命令行的模式下,切换到该目录,执行 ll 命
令,查看 Hive 的日志列表
[hadoop@master ~]$ cd /tmp/hadoop
[hadoop@master hadoop]$ ll
total 312
-rw-rw-r-- 1 hadoop hadoop 314209 May 8 17:03 hive.log
-rw-rw-r-- 1 hadoop hadoop 1019 May 8 17:00 stderr
[hadoop@master hadoop]$ head -5 hive.log
2023-05-08T17:00:19,657 INFO [main]: hwi.HWIServer (HWIServer.java:main(131)) - HWI is starting up
2023-05-08T17:00:21,325 INFO [main]: mortbay.log (Slf4jLog.java:info(67)) - Logging to org.apache.logging.slf4j.Log4jLogger@4145bad8 via org.mortbay.log.Slf4jLog
2023-05-08T17:00:21,358 INFO [main]: mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26
2023-05-08T17:00:21,465 WARN [main]: mortbay.log (Slf4jLog.java:warn(76)) - Can't reuse /tmp/Jetty_0_0_0_0_9999_hive.hwi.2.0.0.war__hwi__p3f5if, using /tmp/Jetty_0_0_0_0_9999_hive.hwi.2.0.0.war__hwi__p3f5if_4307487751968339939
2023-05-08T17:00:21,466 INFO [main]: mortbay.log (Slf4jLog.java:info(67)) - Extract /usr/local/src/hive/lib/hive-hwi-2.0.0.war to /tmp/Jetty_0_0_0_0_9999_hive.hwi.2.0.0.war__hwi__p3f5if_4307487751968339939/webapp
查看大数据平台告警信息
查看大数据平台主机告警信息
Linux 操作系统的的日志文件存储在/var/log 文件夹中。我们可以利用日志管理工具
journalctl 查看 Linux 操作系统主机上的告警信息。journalctl 是 centos7 上专有的日志
管理工具,该工具是从 message 这个文件里读取信息
[root@master ~]# cd /var/log/
[root@master log]# journalctl -p err..alert
-- Logs begin at Tue 2023-05-09 08:33:49 CST, end at Tue 2023-05-09 10:34:16 CST. --
May 09 08:33:49 localhost.localdomain kernel: Detected CPU family 17h model 104
May 09 08:33:49 localhost.localdomain kernel: Warning: AMD Processor - this hardware has not undergone up
May 09 08:33:49 localhost.localdomain kernel: sd 2:0:0:0: [sda] Assuming drive cache: write through
May 09 08:34:02 master kernel: piix4_smbus 0000:00:07.3: SMBus Host Controller not enabled!
May 09 08:34:16 master systemd[1]: Failed to start Postfix Mail Transport Agent.
我们也可以使用 journalctl 命令,根据服务的 ID 号来查询其告警信息
[root@master log]# journalctl _PID=[ID号] -p err
查看Hadoop告警信息
[root@master log]# cd /usr/local/src/hadoop/logs/
[root@master logs]# ll
total 3924
-rw-rw-r-- 1 hadoop hadoop 1593552 May 9 10:37 hadoop-hadoop-namenode-master.log
-rw-rw-r-- 1 hadoop hadoop 716 May 9 10:16 hadoop-hadoop-namenode-master.out
-rw-rw-r-- 1 hadoop hadoop 716 May 9 09:56 hadoop-hadoop-namenode-master.out.1
-rw-rw-r-- 1 hadoop hadoop 716 Apr 12 14:51 hadoop-hadoop-namenode-master.out.1.COMPLETED
-rw-rw-r-- 1 hadoop hadoop 716 May 9 09:51 hadoop-hadoop-namenode-master.out.2
-rw-rw-r-- 1 hadoop hadoop 716 Apr 12 13:25 hadoop-hadoop-namenode-master.out.2.COMPLETED
-rw-rw-r-- 1 hadoop hadoop 716 May 9 09:39 hadoop-hadoop-namenode-master.out.3
-rw-rw-r-- 1 hadoop hadoop 716 Apr 6 22:41 hadoop-hadoop-namenode-master.out.3.COMPLETED
.....................
查看HBase告警信息
在 HBase 的 Web 用户界面提供了日志告警级别的查询和设置功能。在浏览器中访问
http://master:60010/logLevel 页面
若要查询某个日志的告警级别,输入该日志名,点击“Get Log Level”按钮,显示该
日志的告警级别。如查询日志文件 hadoop-hadoop-namenode-master.log 的告警级别
日志文件 hadoop-hadoop-namenode-master.log 的告警级别为 INFO。如果要
将该日志告警级别调整为 WARN,则在第二个框中输入 Log:hadoop-hadoop-namenode-master.log,Level:WARN,点击“Set Log Level”按钮,设置完毕后再次查询该日志文件的级别
查询日志告警信息
[root@master logs]# cd /usr/local/src/hbase/logs
[root@master logs]# tail -10f hbase-hadoop-master-master.log |grep INFO
2023-05-09 10:31:51,706 INFO [master,16000,1683599183843_ChoreService_1] master.RegionStates: Transition {9bffc61846e344cb34473dbb877e9e92 state=OPEN, ts=1683599215578, server=slave1,16020,1683599199173} to {9bffc61846e344cb34473dbb877e9e92 state=PENDING_CLOSE, ts=1683599511705, server=slave1,16020,1683599199173}
2023-05-09 10:31:51,804 INFO [AM.ZK.Worker-pool2-t15] master.RegionStates: Transition {9bffc61846e344cb34473dbb877e9e92 state=PENDING_CLOSE, ts=1683599511705, server=slave1,16020,1683599199173} to {9bffc61846e344cb34473dbb877e9e92 state=CLOSED, ts=1683599511804, server=slave1,16020,1683599199173}
2023-05-09 10:31:51,804 INFO [AM.ZK.Worker-pool2-t15] master.AssignmentManager: Setting node as OFFLINED in ZooKeeper for region {ENCODED => 9bffc61846e344cb34473dbb877e9e92, NAME => 'scores,,1683548174984.9bffc61846e344cb34473dbb877e9e92.', STARTKEY => '', ENDKEY => ''}
2023-05-09 10:31:51,804 INFO [AM.ZK.Worker-pool2-t15] master.RegionStates: Transition {9bffc61846e344cb34473dbb877e9e92 state=CLOSED, ts=1683599511804, server=slave1,16020,1683599199173} to {9bffc61846e344cb34473dbb877e9e92 state=OFFLINE, ts=1683599511804, server=slave1,16020,1683599199173}
2023-05-09 10:31:51,817 INFO [AM.ZK.Worker-pool2-t15] master.AssignmentManager: Assigning scores,,1683548174984.9bffc61846e344cb34473dbb877e9e92. to slave2,16020,1683599198956
2023-05-09 10:31:51,817 INFO [AM.ZK.Worker-pool2-t15] master.RegionStates: Transition {9bffc61846e344cb34473dbb877e9e92 state=OFFLINE, ts=1683599511804, server=slave1,16020,1683599199173} to {9bffc61846e344cb34473dbb877e9e92 state=PENDING_OPEN, ts=1683599511817, server=slave2,16020,1683599198956}
2023-05-09 10:31:52,020 INFO [AM.ZK.Worker-pool2-t17] master.RegionStates: Transition {9bffc61846e344cb34473dbb877e9e92 state=PENDING_OPEN, ts=1683599511817, server=slave2,16020,1683599198956} to {9bffc61846e344cb34473dbb877e9e92 state=OPENING, ts=1683599512020, server=slave2,16020,1683599198956}
2023-05-09 10:31:52,678 INFO [AM.ZK.Worker-pool2-t18] master.RegionStates: Transition {9bffc61846e344cb34473dbb877e9e92 state=OPENING, ts=1683599512020, server=slave2,16020,1683599198956} to {9bffc61846e344cb34473dbb877e9e92 state=OPEN, ts=1683599512677, server=slave2,16020,1683599198956}
2023-05-09 10:31:52,709 INFO [AM.ZK.Worker-pool2-t20] master.RegionStates: Offlined 9bffc61846e344cb34473dbb877e9e92 from slave1,16020,1683599199173
查看Hive告警信息
Hive 的日志文件存储在/tmp/hadoop 目录下,切换到该目录,并执行命令 ll
[root@master logs]# cd /tmp/hadoop
[root@master hadoop]# tail -10f hive.log |grep INFO
2023-05-08T17:03:38,040 INFO [724085490@qtp-1094523823-2]: metastore.ObjectStore (ObjectStore.java:setConf(301)) - Initialized ObjectStore
2023-05-08T17:03:38,389 INFO [724085490@qtp-1094523823-2]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles_core(586)) - Added admin role in metastore
2023-05-08T17:03:38,392 INFO [724085490@qtp-1094523823-2]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles_core(595)) - Added public role in metastore
2023-05-08T17:03:38,458 INFO [724085490@qtp-1094523823-2]: metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers_core(635)) - No user is added in admin role, since config is empty
2023-05-08T17:03:38,586 INFO [724085490@qtp-1094523823-2]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(669)) - 0: get_all_databases
2023-05-08T17:03:38,755 INFO [724085490@qtp-1094523823-2]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=hadoop ip=unknown-ip-addr cmd=get_all_databases
2023-05-08T17:03:38,765 INFO [724085490@qtp-1094523823-2]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(669)) - 0: Shutting down the object store...
2023-05-08T17:03:38,765 INFO [724085490@qtp-1094523823-2]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=hadoop ip=unknown-ip-addr cmd=Shutting down the object store...
2023-05-08T17:03:38,766 INFO [724085490@qtp-1094523823-2]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(669)) - 0: Metastore shutdown complete.
2023-05-08T17:03:38,766 INFO [724085490@qtp-1094523823-2]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=hadoop ip=unknown-ip-addr cmd=Metastore shutdown complete.
[root@master hadoop]# tail -10f stderr |grep ERROR
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.