docker快速部署 influxdb+telegraf+grafana 推送主机及docker容器监控数据并展示图形化

简述

1、InfluxDB

InfluxDB是用Go语言编写的一个开源分布式时序、事件和指标数据库，无需外部依赖。

2、Telegraf

Telegraf是一个插件驱动的服务器代理，用于收集和报告指标，并且是TICK Stack的第一部分。

Telegraf插件可以直接从它运行的系统中获取各种指标，从第三方API中提取指标，甚至通过statsd和Kafka消费者服务监听指标。它还具有输出插件，可将指标发送到各种其他数据存储、服务和消息队列，包括InfluxDB、Graphite、OpenTSDB、Datadog、Librato、Kafka、MQTT、NSQ等。

3、Grafana

Grafana是一个跨平台的开源的度量分析和可视化工具，可以通过将采集的数据查询然后可视化的展示，并及时通知。

简单架构

以下部署可以跨主机。比如telegraf 部署在客户机器， influxdb部署在自己公司外网（开放白名单只允许tekegraf的服务器访问）。 grafana去收集influxdb。

一、环境准备

1. 准备docker、docker-compose 此处网上一大堆跳过

2.创建环境需要目录,(以下每次部署去相关目录操作)

mkdir influxdb  telegraf grafana

二、部署influxdb

1.准备compose文件

version: "3.3"
services:
  influxdb:
    image: influxdb:1.6.3
    container_name: influxdb
    hostname: influxdb
    restart: always
    ports:
        - "20000:8086"                                     #外部端口自定义
    volumes:
        - ./data:/var/lib/influxdb
    environment:
      - TZ=Asia/Shanghai
      - INFLUXDB_HTTP_AUTH_ENABLED=true                    #开启账号密码登录数据库
      - INFLUXDB_DB=telegraf                               #定义数据库名
      - INFLUXDB_ADMIN_USER=admin                          #定义数据库账号
      - INFLUXDB_ADMIN_PASSWORD=aaaa1111                   #定义数据库密码
    deploy:
      resources:
        limits:
          memory: 4g

三、部署telegraf

1.准备配置文件telegraf.conf

[global_tags]
  instance="10.10.10.10"  #本机ip
[agent]
  interval = "60s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  hostname = "10.10.10.10"    #本机ip 。会显示在grafana中
  omit_hostname = false

#[[outputs.http]]             #此参数推送到prometheus数据库中，但下载是用的influxdb 所以注释
#  url = "http://10.10.10.10:9090/api/v1/write"  
#  data_format = "prometheusremotewrite"
#  [outputs.http.headers]
#     Content-Type = "application/x-protobuf"
#     Content-Encoding = "snappy"
#     X-Prometheus-Remote-Write-Version = "0.1.0"

[[outputs.influxdb]]                                               #推送到数据库 
  urls = ["http://111.111.111.111:20000"]                          #数据库的ip加端口。 跨网络需要指定influxdb公网ip端口
  database = "telegraf"           #数据库名
  ## Retention policy to write to. Empty string writes to the default rp.
  retention_policy = ""
  ## Write consistency (clusters only), can be: "any", "one", "quorum", "all"
  write_consistency = "any"
  ## Write timeout (for the InfluxDB client), formatted as a string.
  ## If not provided, will default to 5s. 0s means no timeout (not recommended).
  timeout = "5s"
  username = "admin"             #influxdb的账号
  password = "aaaa1111"          #密码
  ## Set the user agent for HTTP POSTs (can be useful for log differentiation)
  # user_agent = "telegraf"  
  ## Set UDP payload size, defaults to InfluxDB UDP Client default (512 bytes)
  # udp_payload = 512

[[inputs.docker]]                                        #收集docker数据
  endpoint = "unix:///var/run/docker.sock"
  gather_services = false
  container_name_include = []
  container_name_exclude = []
  timeout = "5s"
  docker_label_include = []
  docker_label_exclude = []
  perdevice = true
  total = false
  [inputs.docker.tags]
    env = "kehu-admin"               #定义收集上来的环境信息。 方便后面grafana查看，基本每收集一个都要写， 可增加变量。我这里定义（客户名-服务名）


#以下收集的硬件数据，如果没有收集你想要的， 可以百度执行搜下 或官方找下
[[inputs.cpu]]                                     #收集cpu数据
  ## Whether to report per-cpu stats or not
  percpu = true
  ## Whether to report total system cpu stats or not
  totalcpu = true
  ## Comment this line if you want the raw CPU time metrics
  fielddrop = ["time_*"]
  [inputs.cpu.tags]
    env = "kehu-admin"           


# Read metrics about disk usage by mount point
[[inputs.disk]]                             
  ## By default, telegraf gather stats for all mountpoints.
  ## Setting mountpoints will restrict the stats to the specified mountpoints.
  # mount_points = ["/"]
  ## Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually
  ## present on /run, /var/run, /dev/shm or /dev).
  ignore_fs = ["tmpfs", "devtmpfs"]
  [inputs.disk.tags]
    env = "kehu-admin"

# Read metrics about disk IO by device
[[inputs.diskio]]
  [inputs.diskio.tags]
    env = "kehu-admin"
  ## By default, telegraf will gather stats for all devices including
  ## disk partitions.
  ## Setting devices will restrict the stats to the specified devices.
  # devices = ["sda", "sdb"]
  ## Uncomment the following line if you need disk serial numbers.
  # skip_serial_number = false

# Get kernel statistics from /proc/stat
[[inputs.kernel]]
  [inputs.kernel.tags]
    env = "kehu-admin"

# Read metrics about memory usage
[[inputs.mem]]
  [inputs.mem.tags]
    env = "kehu-admin"

# Get the number of processes and group them by status
[[inputs.processes]]
  [inputs.processes.tags]
    env = "kehu-admin"

# Read metrics about swap memory usage
[[inputs.swap]]
  [inputs.swap.tags]
    env = "kehu-admin"

# Read metrics about system load & uptime
[[inputs.system]]
  [inputs.system.tags]
    env = "kehu-admin"


# Read metrics about network interface usage
[[inputs.nstat]]
  [inputs.nstat.tags]
    env = "kehu-admin"
  # collect data only about specific interfaces
  # interfaces = ["eth0"]

[[inputs.netstat]]
  [inputs.netstat.tags]
    env = "kehu-admin"

[[inputs.interrupts]]
  [inputs.interrupts.tags]
    env = "kehu-admin"

[[inputs.linux_sysctl_fs]]
  [inputs.linux_sysctl_fs.tags]
    env = "kehu-admin"

2.准备compose文件

version: "3.3"
services:
  telegraf:
    image: telegraf
    container_name: telegraf
    restart: always
    environment:
      HOST_PROC: /rootfs/proc
      HOST_SYS: /rootfs/sys
      HOST_ETC: /rootfs/etc
    user: telegraf:994     #/etc/group 看下docker组的id ，需要修改
    volumes:
     - ./telegraf.conf:/etc/telegraf/telegraf.conf:ro    #指定上面的配置文件
     - /var/run/docker.sock:/var/run/docker.sock         #收集docker 下面收集系统
     - /sys:/rootfs/sys:ro
     - /proc:/rootfs/proc:ro
     - /etc:/rootfs/etc:ro
    deploy:
      resources:
        limits:
          cpus: '0.5'
          memory: 512M

四、部署grafana

1.准备compose文件

#web界
  grafana:
    image: grafana/grafana:9.5.18
    restart: "always"
    ports:
      - 10000:3000
    container_name: "grafana"
    volumes:
      - "./grafana/grafana.ini:/etc/grafana/grafana.ini"              #配置文件自行拷贝出来。通过docker run启动个grafana  然后docker cp拷贝到外部，杀掉run启动的容器
      - "./grafana/grafana-storage:/var/lib/grafana"
      - "/etc/localtime:/etc/localtime:ro"

五、启动服务

docker-compose -f *****.yml up -d    #指定各个yml文件

六、设置grafana

1. 浏览器打开grafana 10.10.10.10:10000 上面写的什么端口就什么端口

2.设置中文

3.设置数据源。指定influxdb 的数据库。 grafana和influxdb服务器网络要通

4.导入仪表盘

#去官方下载监控模板即可。我用的是 10578

插件地址：https://grafana.com/grafana/dashboards

导入完后不会完全展示。设置刚才telegraf设置的变量，完全设置一样即可

七、告警

需要通过grafana告警。需要再自行研究下。如果内网监控推荐使用Node_Exporter和cAdvisor ，做暴露端口白名单，通过alert告警配置比较方便。

docker compose部署prometheus grafana-alertmanager-prometheus-webhook-dingtalk a - mrdongdong - 博客园 (cnblogs.com)

posted @ 2024-05-13 09:42 mrdongdong 阅读(1382) 评论(0) 编辑收藏举报

刷新页面返回顶部

mrdongdong

docker快速部署 influxdb+telegraf+grafana 推送主机及docker容器监控数据 并展示图形化

公告

docker快速部署 influxdb+telegraf+grafana 推送主机及docker容器监控数据并展示图形化