第十四章·Kibana深入-Timelion画图实现系统监控

什么是Timelion？

Timelion使你可以轻松获得以下问题的答案：

1）随着时间的推移，每个唯一的用户会查看多少个页面?
2）这个星期五和上周五之间的交通量有什么不同？
3）今天有多少日本人口来到我的网站？
4）标准普尔500指数的10日均线是多少？
5）过去两年收到的所有搜索请求的累计总和是多少？

　　Timelion是Kibana时间序列的可视化工具。时间序列可视化是可视化的，以时间顺序分析数据。Timelion可用于绘制二维图形，时间绘制在x轴上。

　　与使用简单的条形图或线条可视化相比有什么优势？Timelion采取不同的方法。使用Timelion特定语法，您通过将功能链接在一起来定义图形，而不是使用可视化编辑器创建图表。该语法启用了经典点系列图不提供的一些功能，如将不同索引或数据源的数据绘制到一个图形中。

但是在使用Timelion之前，我们需要下载并安装Metricbeat

Metricbeat介绍及部署

Metricbeat介绍

Metricbeat可以定期收集操作系统和服务器的运行指标（CPU，内存，硬盘，IO,读写速度，进程等等），Metricbeat可以将收集到的指标和数据发送到你指定的输出，比如：elasticsearch，logstash,redis等等，最终达成监视服务器的目标。

Metricbeat部署及配置

因为我们使用的ES和Kibana是5版本的，所以我们需要下载5版本的Metricbeat

#RPM包下载
[root@elkstack04 ~]# wget https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-5.3.3-x86_64.rpm

#源码包下载
[root@elkstack04 ~]# wget https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-5.3.3-linux-x86_64.tar.gz

#安装Metricbeat
[root@elkstack04 ~]# yum localinstall -y metricbeat-5.3.3-x86_64.rpm

#修改配置文件
[root@elkstack04 ~]# vim /etc/metricbeat/metricbeat.yml
#==========================  Modules configuration ============================
metricbeat.modules:

#------------------------------- System Module -------------------------------
- module: system
  metricsets:
    # CPU stats
    - cpu

    # System Load stats
    - load

    # Per CPU core stats
    #- core

    # IO stats
    #- diskio

    # Per filesystem stats
    - filesystem

    # File system summary stats
    - fsstat

    # Memory stats
    - memory

    # Network stats
    - network

    # Per process stats
    - process

    # Sockets (linux only)
    #- socket
  enabled: true
  period: 1m
  processes: ['.*']
#================================ Outputs =====================================

# Configure what outputs to use when sending the data collected by the beat.
# Multiple outputs may be used.

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["10.0.0.51:9200"]

  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"
  #要加载仪表板，可以在metricbeat设置中启用仪表板加载。当仪表板加载被启用时，Metricbeat使用Kibana API来加载样本仪表板。只有当Metricbeat启动时，才会尝试仪表板加载。
  # 设置kibana服务地址
  setup.kibana.host: "10.0.0.54:5601"
  # 加载默认的仪表盘样式
  setup.dashboards.enabled: true
  # 设置如果存在模板，则不覆盖原有模板
  setup.template.overwrite: false
  
#启动Metricbeat(CentOS6)
[root@elkstack04 ~]# /etc/init.d/metricbeat start
#启动Metricbeat(CentOS7)
[root@elkstack04 ~]# systemctl start metricbeat

#检查metricbeat是否正常运行（返回索引对应内容）
[root@elkstack04 ~]# curl -XGET 'http://10.0.0.51:9200/metricbeat-*/_search?pretty'

结果如下：

打开浏览器，访问：http://10.0.0.51:9100/

查看Metricbeat索引

打开浏览器，访问：http://10.0.0.54:5601/

添加metricbeat-*索引

创建后，即可在Discover中看到Metricbeat信息

Timelion使用Metricbeat

创建时间序列可视化

使用Metricbeat的时间序列数据带你浏览Timelion提供的一些函数。

创建第一个可视化将比较在用户空间中花费的CPU时间与一小时的结果偏移量的实时百分比，为了创建这个可视化，我们需要创建两个Timelion表达式，一个是system.cpu.user.pct的实时平均数，另一个是1小时的平均偏移量。

首先，需要在第一个表达式中定义index、timefield和metric，并在Timelion查询栏中输入以下表达式。

.es(index=metricbeat-*, timefield='@timestamp', metric='avg:system.cpu.user.pct')

现在需要添加另一个具有前一小时数据的系列，以便进行比较，为此，你必须向.es()函数添加一个offset参数，offset将用日期表达式偏移序列检索。对于本例，你希望将数据偏移一小时，并使用日期表达式-1h，使用逗号分隔这两个系列，在Timelion查询栏中输入以下表达式：

.es(index=metricbeat-*, timefield='@timestamp', metric='avg:system.cpu.user.pct'), .es(offset=-1h,index=metricbeat-*, timefield='@timestamp', metric='avg:system.cpu.user.pct')

很难区分这两个系列，自定义标签以便于区分它们，你总是可以将.label()函数附加到任何表达式以添加自定义标签，在Timelion查询栏中输入以下表达式来定制标签：

.es(offset=-1h,index=metricbeat-*, timefield='@timestamp', metric='avg:system.cpu.user.pct').label('last hour'), .es(index=metricbeat-*, timefield='@timestamp', metric='avg:system.cpu.user.pct').label('current hour')

保存完整的Timelion工作表作为Metricbeat示例，作为一种最佳实践，你应该在完成本教程的过程中保存对本工作表所做的任何重要更改。

定制和格式化可视化

Timelion有很多定制选项，你几乎可以使用可用的函数对图表的每个方面进行个性化设置，执行以下修改。

1)添加一个标题
2)更改系列类型
3)改变一个系列的颜色和不透明度
4)修改图例

之前用两个系列创建了一个时间轴图表，让我们继续定制这个可视化。

在进行任何其他修改之前，将title()函数附加到表达式的末尾，以添加具有有意义名称的标题，这将使不熟悉的用户更容易理解可视化目的。对于这个示例，将title('CPU usage')添加到原始系列中，在Timelion 查询栏中使用以下表达式：

.es(offset=-1h,index=metricbeat-*, timefield='@timestamp', metric='avg:system.cpu.user.pct').label('last hour'), .es(index=metricbeat-*, timefield='@timestamp', metric='avg:system.cpu.user.pct').label('current hour').title('CPU usage')

为了进一步区分过去一小时系列，你将把图表类型更改为区域图表，为了做到这一点，你需要使用.lines()函数来定制折线图，你将设置fill和width参数，分别设置折线图的填充和折线宽度。在本例中，你将通过添加.lines(fill=1,width=0.5)将填充级别设置为1，边框宽度设置为0.5，在Timelion查询栏中使用以下表达式：

.es(offset=-1h,index=metricbeat-*, timefield='@timestamp', metric='avg:system.cpu.user.pct').label('last hour').lines(fill=1,width=0.5), .es(index=metricbeat-*, timefield='@timestamp', metric='avg:system.cpu.user.pct').label('current hour').title('CPU usage')

让我们给这些系列涂上颜色，使当前的小时系列比过去一个小时系列流行一点，color()函数可用于更改任何系列的颜色，并接受标准颜色名称、十六进制值或分组系列的颜色模式。对于这个示例，你将在过去一个小时使用.color(gray)，而在当前小时使用.color(#1E90FF)，在Timelion查询栏中输入以下表达式进行调整：

.es(offset=-1h,index=metricbeat-*, timefield='@timestamp', metric='avg:system.cpu.user.pct').label('last hour').lines(fill=1,width=0.5).color(gray), .es(index=metricbeat-*, timefield='@timestamp', metric='avg:system.cpu.user.pct').label('current hour').title('CPU usage').color(#1E90FF)

最后但并非最不重要，调整图例，使其占用尽可能小的空间，你可以使用.legend()函数来设置图例的位置和样式。在本例中，通过将.legend(columns=2, position=nw)两列追加到原始系列，将图例放置在可视化的西北位置，使用以下表达式进行调整：

.es(offset=-1h,index=metricbeat-*, timefield='@timestamp', metric='avg:system.cpu.user.pct').label('last hour').lines(fill=1,width=0.5).color(gray), .es(index=metricbeat-*, timefield='@timestamp', metric='avg:system.cpu.user.pct').label('current hour').title('CPU usage').color(#1E90FF).legend(columns=2, position=nw)

保存下来，再创建一个新的。

使用数学函数

在前两部分中，已经学习了如何创建和样式化Timelion可视化，本节将探索Timelion提供的数学函数。你将继续使用Metricbeat数据为入站和出站网络流量创建新的Timelion可视化，首先，需要在工作表中添加一个新的Timelion可视化。

在顶部菜单中，单击Add添加第二个可视化，当添加到工作表中时，你会注意到查询栏已经被替换为默认的.es(*)表达式，这是因为查询与你选择的Timelion工作表上的可视化相关联。

开始跟踪入站/出站网络流量，你的第一个表达式将计算system.network.in.bytes的最大值，将下面的表达式输入到你的Timelion查询栏：

.es(index=metricbeat*, timefield=@timestamp, metric=max:system.network.in.bytes)

在绘制变化率时，监视网络流量更有价值，derivative()函数就是这样做的 - 绘制值随时间的变化，通过在表达式末尾添加.derivative()可以很容易地做到这一点，使用以下表达式来更新你的可视化：

现在是出站流量，你需要为system.network.out.bytes添加类似的计算，由于出站流量将离开你的机器，因此将此指标表示为负数是有意义的，.multiply()函数将系列乘以一个数字，这个数字是系列或系列列表的结果。对于本例，你将使用.multiply(-1)将出站网络流量转换为负值，使用以下表达式来更新你的可视化：

.es(index=metricbeat*, timefield=@timestamp, metric=max:system.network.in.bytes).derivative(), .es(index=metricbeat*, timefield=@timestamp, metric=max:system.network.out.bytes).derivative().multiply(-1)

为了使这个可视化更容易使用，将这个系列从字节转换为兆字节，Timelion有一个.divide()函数可以使用，.divide()接受与.multiply()相同的输入，并将这个系列除以所定义的除数，使用以下表达式来更新你的可视化：

使用上一节中学习的格式化函数.title()、.label()、.color()、.lines()和.legend()，让我们稍微整理一下这个可视化，使用以下表达式来更新你的可视化：

.es(index=metricbeat*, timefield=@timestamp, metric=max:system.network.in.bytes).derivative().divide(1048576).lines(fill=2, width=1).color(green).label("Inbound traffic").title("Network traffic (MB/s)"), .es(index=metricbeat*, timefield=@timestamp, metric=max:system.network.out.bytes).derivative().multiply(-1).divide(1048576).lines(fill=2, width=1).color(blue).label("Outbound traffic").legend(columns=2, position=nw)

保存，开启新的，画图

使用条件逻辑和跟踪趋势

在本节中，你将学习如何使用条件逻辑修改时间序列数据，并使用移动平均值创建趋势，这有助于随着时间的推移很容易地发现异常值和模式。

对于本教程，你将继续使用Metricbeat数据添加另一个监控内存消耗的可视化，首先，使用以下表达式绘制system.memory.actual.used.bytes的最大值。

.es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes')

让我们创建两个阈值来监视使用的内存数量，在本教程中，警告阈值为234MB，严重阈值为235MB，当使用内存的最大数量超过这些阈值中的任何一个时，将相应地对该系列进行着色。

如果你的计算机的阈值过高或过低，请相应地进行调整。

要配置这两个阈值，可以使用Timelion的条件逻辑，在本教程中，你将使用if()将每个点与一个数字进行比较，如果条件的值为true，则调整样式，如果条件的值为false，则使用默认样式，Timelion提供了以下六个操作符值进行比较。

操作符	含义
eq	相等
ne	不相等
lt	小于
gt	大于
lte	小于等于
gte	大于等于

由于有两个阈值，因此对它们进行不同的样式是有意义的，使用gt操作符将警告阈值用.color('#FFCC11')涂成黄色，将严重阈值用.color('red')涂成红色，在Timelion查询栏中输入以下表达式，以应用条件逻辑和阈值样式：

.es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes'), .es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes').if(gt,234000000,.es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes'),null).label('warning').color('#FFCC11'), .es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes').if(gt,235000000,.es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes'),null).label('serious').color('red')

现在你已经定义了阈值来轻松地识别异常值，让我们创建一个新的系列来确定真正的趋势是什么，Timelion的mvavg()函数允许计算给定窗口上的移动平均值，这对嘈杂的时间序列特别有用，对于本教程，你将使用.mvavg(10)来创建具有10个数据点窗口的移动平均线，使用以下表达式创建最大内存使用量的移动平均值：

.es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes'), .es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes').if(gt,234000000,.es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes'),null).label('warning').color('#FFCC11'), .es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes').if(gt,235000000,.es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes'),null).label('serious').color('red'), .es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes').mvavg(10)

现在你已经有了阈值和移动平均值，让我们格式化可视化，以便更容易使用，和最后一部分一样，使用.color()、.line()、.title()和.legend()函数相应地更新可视化：

.es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes').label('max memory').title('Memory consumption over time'), .es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes').if(gt,234000000,.es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes'),null).label('warning').color('#FFCC11').lines(width=5), .es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes').if(gt,235000000,.es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes'),null).label('serious').color('red').lines(width=5), .es(index=metricbeat-*, timefield='@timestamp', metric='max:system.memory.actual.used.bytes').mvavg(10).label('mvavg').lines(width=2).color(#5E5E5E).legend(columns=4, position=nw)