docker快速部署 influxdb+telegraf+grafana 推送主机及docker容器监控数据 并展示图形化
简述
1、InfluxDB
InfluxDB是用Go语言编写的一个开源分布式时序、事件和指标数据库,无需外部依赖。
2、Telegraf
Telegraf是一个插件驱动的服务器代理,用于收集和报告指标,并且是TICK Stack的第一部分。
Telegraf插件可以直接从它运行的系统中获取各种指标,从第三方API中提取指标,甚至通过statsd和Kafka消费者服务监听指标。它还具有输出插件,可将指标发送到各种其他数据存储、服务和消息队列,包括InfluxDB、Graphite、OpenTSDB、Datadog、Librato、Kafka、MQTT、NSQ等。
3、Grafana
Grafana是一个跨平台的开源的度量分析和可视化工具,可以通过将采集的数据查询然后可视化的展示,并及时通知。
简单架构
以下部署可以跨主机 。比如telegraf 部署在客户机器, influxdb部署在自己公司外网(开放白名单只允许tekegraf的服务器访问)。 grafana去收集influxdb。
一、环境准备
1. 准备docker、docker-compose 此处网上一大堆 跳过
2.创建环境需要目录,(以下每次部署去相关目录操作)
mkdir influxdb telegraf grafana
二、 部署influxdb
1.准备compose文件
version: "3.3" services: influxdb: image: influxdb:1.6.3 container_name: influxdb hostname: influxdb restart: always ports: - "20000:8086" #外部端口自定义 volumes: - ./data:/var/lib/influxdb environment: - TZ=Asia/Shanghai - INFLUXDB_HTTP_AUTH_ENABLED=true #开启账号密码登录数据库 - INFLUXDB_DB=telegraf #定义数据库名 - INFLUXDB_ADMIN_USER=admin #定义数据库账号 - INFLUXDB_ADMIN_PASSWORD=aaaa1111 #定义数据库密码 deploy: resources: limits: memory: 4g
三、部署telegraf
1.准备配置文件telegraf.conf
[global_tags] instance="10.10.10.10" #本机ip [agent] interval = "60s" round_interval = true metric_batch_size = 1000 metric_buffer_limit = 10000 collection_jitter = "0s" flush_interval = "10s" flush_jitter = "0s" precision = "" hostname = "10.10.10.10" #本机ip 。会显示在grafana中 omit_hostname = false #[[outputs.http]] #此参数推送到prometheus数据库中,但下载是用的influxdb 所以注释 # url = "http://10.10.10.10:9090/api/v1/write" # data_format = "prometheusremotewrite" # [outputs.http.headers] # Content-Type = "application/x-protobuf" # Content-Encoding = "snappy" # X-Prometheus-Remote-Write-Version = "0.1.0" [[outputs.influxdb]] #推送到数据库 urls = ["http://111.111.111.111:20000"] #数据库的ip加端口。 跨网络需要指定influxdb公网ip端口 database = "telegraf" #数据库名 ## Retention policy to write to. Empty string writes to the default rp. retention_policy = "" ## Write consistency (clusters only), can be: "any", "one", "quorum", "all" write_consistency = "any" ## Write timeout (for the InfluxDB client), formatted as a string. ## If not provided, will default to 5s. 0s means no timeout (not recommended). timeout = "5s" username = "admin" #influxdb的账号 password = "aaaa1111" #密码 ## Set the user agent for HTTP POSTs (can be useful for log differentiation) # user_agent = "telegraf" ## Set UDP payload size, defaults to InfluxDB UDP Client default (512 bytes) # udp_payload = 512 [[inputs.docker]] #收集docker数据 endpoint = "unix:///var/run/docker.sock" gather_services = false container_name_include = [] container_name_exclude = [] timeout = "5s" docker_label_include = [] docker_label_exclude = [] perdevice = true total = false [inputs.docker.tags] env = "kehu-admin" #定义收集上来的环境信息。 方便后面grafana查看,基本每收集一个都要写, 可增加变量。我这里定义(客户名-服务名) #以下收集的硬件数据,如果没有收集你想要的, 可以百度执行搜下 或官方找下 [[inputs.cpu]] #收集cpu数据 ## Whether to report per-cpu stats or not percpu = true ## Whether to report total system cpu stats or not totalcpu = true ## Comment this line if you want the raw CPU time metrics fielddrop = ["time_*"] [inputs.cpu.tags] env = "kehu-admin" # Read metrics about disk usage by mount point [[inputs.disk]] ## By default, telegraf gather stats for all mountpoints. ## Setting mountpoints will restrict the stats to the specified mountpoints. # mount_points = ["/"] ## Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually ## present on /run, /var/run, /dev/shm or /dev). ignore_fs = ["tmpfs", "devtmpfs"] [inputs.disk.tags] env = "kehu-admin" # Read metrics about disk IO by device [[inputs.diskio]] [inputs.diskio.tags] env = "kehu-admin" ## By default, telegraf will gather stats for all devices including ## disk partitions. ## Setting devices will restrict the stats to the specified devices. # devices = ["sda", "sdb"] ## Uncomment the following line if you need disk serial numbers. # skip_serial_number = false # Get kernel statistics from /proc/stat [[inputs.kernel]] [inputs.kernel.tags] env = "kehu-admin" # Read metrics about memory usage [[inputs.mem]] [inputs.mem.tags] env = "kehu-admin" # Get the number of processes and group them by status [[inputs.processes]] [inputs.processes.tags] env = "kehu-admin" # Read metrics about swap memory usage [[inputs.swap]] [inputs.swap.tags] env = "kehu-admin" # Read metrics about system load & uptime [[inputs.system]] [inputs.system.tags] env = "kehu-admin" # Read metrics about network interface usage [[inputs.nstat]] [inputs.nstat.tags] env = "kehu-admin" # collect data only about specific interfaces # interfaces = ["eth0"] [[inputs.netstat]] [inputs.netstat.tags] env = "kehu-admin" [[inputs.interrupts]] [inputs.interrupts.tags] env = "kehu-admin" [[inputs.linux_sysctl_fs]] [inputs.linux_sysctl_fs.tags] env = "kehu-admin"
2.准备compose文件
version: "3.3" services: telegraf: image: telegraf container_name: telegraf restart: always environment: HOST_PROC: /rootfs/proc HOST_SYS: /rootfs/sys HOST_ETC: /rootfs/etc user: telegraf:994 #/etc/group 看下docker组的id ,需要修改 volumes: - ./telegraf.conf:/etc/telegraf/telegraf.conf:ro #指定上面的配置文件 - /var/run/docker.sock:/var/run/docker.sock #收集docker 下面收集系统 - /sys:/rootfs/sys:ro - /proc:/rootfs/proc:ro - /etc:/rootfs/etc:ro deploy: resources: limits: cpus: '0.5' memory: 512M
四、部署grafana
1.准备compose文件
#web界 grafana: image: grafana/grafana:9.5.18 restart: "always" ports: - 10000:3000 container_name: "grafana" volumes: - "./grafana/grafana.ini:/etc/grafana/grafana.ini" #配置文件自行拷贝出来。通过docker run启动个grafana 然后docker cp拷贝到外部,杀掉run启动的容器 - "./grafana/grafana-storage:/var/lib/grafana" - "/etc/localtime:/etc/localtime:ro"
五、启动服务
docker-compose -f *****.yml up -d #指定各个yml文件
六、设置grafana
1. 浏览器打开grafana 10.10.10.10:10000 上面写的什么端口就什么端口
2.设置中文
3.设置数据源。指定influxdb 的数据库。 grafana和influxdb服务器网络要通
4.导入仪表盘
#去官方下载监控模板即可 。 我用的是 10578
插件地址:
导入完后不会完全展示。 设置刚才telegraf设置的变量,完全设置一样即可
七、告警
需要通过grafana告警。 需要再自行研究下。 如果内网监控 推荐使用Node_Exporter和cAdvisor ,做暴露端口白名单,通过alert告警配置比较方便。