docker-compsoe部署prometheus、Grafana监控、钉钉告警(二)
docker-compsoe部署prometheus、Grafana监控、钉钉告警(二)
二、Exporter
- 有哪些Exporter
参见 https://prometheus.io/docs/instrumenting/exporters/
- 以MSSQL监控为例
MSSQL的exporter有prometheus-mssql-exporter (官方推荐),另有sql-exporter
- 对比
对比点 | prometheus-mssql-exporter | sql-exporter | 说明 |
---|---|---|---|
中心化 | 否 | 是 | 是否一个exporter上可连多个数据库实例,不需要在各个实例上部署exporter |
监控不同类型数据库 | 否 | 是 | sql-exporter可监控sqlserver、mysql、oracle;prometheus-mssql-exporter只能监控mssql |
现成Dashboards | 是 | 否 | 通过grafana导入收录的dashboards |
监控metrics | 内置 | 内置,可增减 | 是否内置一些监控metrics |
- 部署
3.1 机器规划
服务器名称 | IP Address | 说明 |
---|---|---|
prom-mssql-exporter | 128.0.255.96 | prometheus官方推荐 |
sql-exporter | 128.0.255.96 | 支持多种数据库和实例 |
3.2 文件目录规划
- 编写
sql_exporter.yml
配置文件
建文件目录
mkdir -p sql_exporter/config && vim sql_exporter/config/sql_exportter.yml
# Global defaults.
global:
# Subtracted from Prometheus' scrape_timeout to give us some headroom and prevent Prometheus from timing out first.
scrape_timeout_offset: 500ms
# Minimum interval between collector runs: by default (0s) collectors are executed on every scrape.
min_interval: 0s
# Maximum number of open connections to any one target. Metric queries will run concurrently on multiple connections,
# as will concurrent scrapes.
max_connections: 3
# Maximum number of idle connections to any one target. Unless you use very long collection intervals, this should
# always be the same as max_connections.
max_idle_connections: 3
# The target to monitor and the collectors to execute on it.
target:
# Data source name always has a URI schema that matches the driver name. In some cases (e.g. MySQL)
# the schema gets dropped or replaced to match the driver expected DSN format.
#data_source_name: 'sqlserver://sql_exporter:qICJEasdqwDiOSrdT96@128.0.202.103:1433'
# Collectors (referenced by name) to execute on the target.
#collectors: [mssql_standard]
# Collector files specifies a list of globs. One collector definition is read from each matching file.
collector_files:
- "*.collector.yml"
# Jobs are equivalent to jobs in the Prometheus configuration: they group similar targets with similar metrics together.
jobs:
# All metrics from all targets get a `job` label, set to this value.
- job_name: sql-exporter #database_metrics
# The set of collectors (defined below) to be applied to all targets in this job.
collectors: [mssql_standard]
# Similar to the Prometheus configuration, multiple sets of targets may be defined, each with an optional set of
# labels to be applied to all metrics.
static_configs:
- targets:
'sqlserver_103_test': 'sqlserver://sa:password@com@128.0.202.103:1433?database=master'
'sqlserver_103_dev': 'sqlserver://sa:password@com@128.0.202.103:3433?database=master'
'sqlserver_xulq': 'sqlserver://sa:password@com@128.0.23.17:1433?database=master'
#'postgres_dev': 'postgres://postgres:postgrespassword@postgres:5432/postgresdb?sslmode=disable'
#'mysql_dev': 'mysql://testuser:testuser@tcp(mysql:3306)/mysqldb'
#labels:
# type: sql_exporter
- 编写
mssql_standard.collector.yml
metrics配置文件
vim mssql_standard.collector.yml
# A collector defining standard metrics for Microsoft SQL Server.
#
# It is required that the SQL Server user has the following permissions:
#
# GRANT VIEW ANY DEFINITION TO
# GRANT VIEW SERVER STATE TO
#
collector_name: mssql_standard
# Similar to global.min_interval, but applies to the queries defined by this collector only.
#min_interval: 0s
metrics:
# prometheus-mssql-exporter metrics
- metric_name: mssql_instance_local_time
type: gauge
help: 'Local time in seconds since epoch (Unix time).'
values: [unix_time]
query: |
SELECT DATEDIFF(second, '19700101', GETUTCDATE()) AS unix_time
- metric_name: mssql_local_time_seconds
type: gauge
help: 'Local time in seconds since epoch (Unix time).'
values: [unix_time]
query: |
SELECT DATEDIFF(second, '19700101', GETUTCDATE()) AS unix_time
#
# Collected from sys.sysprocesses
#
# ======自定义metrics======
# 当前处理脚本数
- metric_name: mssql_current_exec_num
type: gauge
help: 'Number of scripts currently executing.'
values: [count]
query: |
SELECT
COUNT(1) count
FROM sys.[dm_exec_requests] AS der
INNER JOIN sys.sysprocesses AS sp ON der.session_id=sp.spid
CROSS APPLY sys.[dm_exec_sql_text](der.[sql_handle]) AS dest
WHERE [session_id]>50 AND session_id<>@@SPID
# 每个数据库当前处理脚本数
- metric_name: mssql_current_exec_num_per_database
type: gauge
help: 'Number of scripts currently executing per database.'
key_labels:
- database
values: [count]
query: |
SELECT
DB_NAME(der.[database_id]) [database],
COUNT(1) count
FROM sys.[dm_exec_requests] AS der
INNER JOIN sys.sysprocesses AS sp ON der.session_id=sp.spid
CROSS APPLY sys.[dm_exec_sql_text](der.[sql_handle]) AS dest
WHERE [session_id]>50 AND session_id<>@@SPID
GROUP BY der.[database_id]
# ======自定义metrics======
- metric_name: mssql_connections
type: gauge
help: 'Number of active connections.'
key_labels:
- database
values: [count]
query: |
SELECT ISNULL(DB_NAME(sp.dbid),'') AS [database], COUNT(sp.spid) AS count
FROM sys.sysprocesses sp
GROUP BY DB_NAME(sp.dbid)
#
# Collected from sys.dm_os_performance_counters
#
- metric_name: mssql_deadlocks
type: counter
help: 'Number of lock requests that resulted in a deadlock.'
values: [cntr_value]
query: |
SELECT cntr_value
FROM sys.dm_os_performance_counters WITH (NOLOCK)
WHERE counter_name = 'Number of Deadlocks/sec' AND instance_name = '_Total'
- metric_name: mssql_user_errors
type: counter
help: 'Number of user errors.'
values: [cntr_value]
query: |
SELECT cntr_value
FROM sys.dm_os_performance_counters WITH (NOLOCK)
WHERE counter_name = 'Errors/sec' AND instance_name = 'User Errors'
- metric_name: mssql_kill_connection_errors
type: counter
help: 'Number of severe errors that caused SQL Server to kill the connection.'
values: [cntr_value]
query: |
SELECT cntr_value
FROM sys.dm_os_performance_counters WITH (NOLOCK)
WHERE counter_name = 'Errors/sec' AND instance_name = 'Kill Connection Errors'
- metric_name: mssql_page_life_expectancy_seconds
type: gauge
help: 'The minimum number of seconds a page will stay in the buffer pool on this node without references.'
values: [cntr_value]
query: |
SELECT top(1) cntr_value
FROM sys.dm_os_performance_counters WITH (NOLOCK)
WHERE counter_name = 'Page life expectancy'
- metric_name: mssql_batch_requests
type: counter
help: 'Number of command batches received.'
values: [cntr_value]
query: |
SELECT cntr_value
FROM sys.dm_os_performance_counters WITH (NOLOCK)
WHERE counter_name = 'Batch Requests/sec'
- metric_name: mssql_log_growths
type: counter
help: 'Number of times the transaction log has been expanded, per database.'
key_labels:
- database
values: [cntr_value]
query: |
SELECT rtrim(instance_name) AS [database], cntr_value
FROM sys.dm_os_performance_counters WITH (NOLOCK)
WHERE counter_name = 'Log Growths' AND instance_name <> '_Total'
- metric_name: mssql_buffer_cache_hit_ratio
type: gauge
help: 'Ratio of requests that hit the buffer cache'
values: [cntr_value]
query: |
SELECT cntr_value
FROM sys.dm_os_performance_counters
WHERE [counter_name] = 'Buffer cache hit ratio'
- metric_name: mssql_checkpoint_pages_sec
type: gauge
help: 'Checkpoint Pages Per Second'
values: [cntr_value]
query: |
SELECT cntr_value
FROM sys.dm_os_performance_counters
WHERE [counter_name] = 'Checkpoint pages/sec'
#
# Collected from sys.dm_io_virtual_file_stats
#
# prometheus-mssql-exporter metrics
- metric_name: mssql_io_stall
type: gauge
help: 'mssql_io_stall Wait time (ms) of stall since last restart'
key_labels:
- database
value_label: operation
values:
- read
- write
query_ref: mssql_io_stall
# prometheus-mssql-exporter metrics
- metric_name: mssql_io_stall_total
type: gauge
help: 'mssql_io_stall_total Wait time (ms) of stall since last restart'
key_labels:
- database
values:
- io_stall
query_ref: mssql_io_stall
- metric_name: mssql_io_stall_seconds
type: counter
help: 'Stall time in seconds per database and I/O operation.'
key_labels:
- database
value_label: operation
values:
- read
- write
query_ref: mssql_io_stall
- metric_name: mssql_io_stall_total_seconds
type: counter
help: 'Total stall time in seconds per database.'
key_labels:
- database
values:
- io_stall
query_ref: mssql_io_stall
#
# Collected from sys.dm_os_process_memory
#
- metric_name: mssql_resident_memory_bytes
type: gauge
help: 'SQL Server resident memory size (AKA working set).'
values: [resident_memory_bytes]
query_ref: mssql_process_memory
- metric_name: mssql_virtual_memory_bytes
type: gauge
help: 'SQL Server committed virtual memory size.'
values: [virtual_memory_bytes]
query_ref: mssql_process_memory
- metric_name: mssql_memory_utilization_percentage
type: gauge
help: 'The percentage of committed memory that is in the working set.'
values: [memory_utilization_percentage]
query_ref: mssql_process_memory
- metric_name: mssql_page_fault_count
type: counter
help: 'The number of page faults that were incurred by the SQL Server process.'
values: [page_fault_count]
query_ref: mssql_process_memory
#
# Collected from sys.dm_os_sys_memory
#
- metric_name: mssql_os_memory
type: gauge
help: 'OS physical memory, used and available.'
value_label: 'state'
values: [used, available]
query: |
SELECT
(total_physical_memory_kb - available_physical_memory_kb) * 1024 AS used,
available_physical_memory_kb * 1024 AS available
FROM sys.dm_os_sys_memory
- metric_name: mssql_os_page_file
type: gauge
help: 'OS page file, used and available.'
value_label: 'state'
values: [used, available]
query: |
SELECT
(total_page_file_kb - available_page_file_kb) * 1024 AS used,
available_page_file_kb * 1024 AS available
FROM sys.dm_os_sys_memory
#
# Collected from sys.database
#
# prometheus-mssql-exporter metrics
- metric_name: mssql_database_state
type: gauge
help: 'mssql_database_state Databases states: 0=ONLINE 1=RESTORING 2=RECOVERING 3=RECOVERY_PENDING 4=SUSPECT 5=EMERGENCY 6=OFFLINE 7=COPYING 10=OFFLINE_SECONDARY.'
key_labels:
- database
values: [state]
query: |
SELECT [name] AS [database],[state] FROM sys.databases
#
# Collected from sys.dm_tran_locks
#
# ======自定义metrics======
# 当前死锁数
- metric_name: mssql_current_deadlocks
type: gauge
help: 'mssql_current_deadlocks database current dead lock.'
values: [count]
query: |
SELECT
COUNT(1) count
FROM sys.dm_tran_locks
WHERE resource_type='OBJECT'
# ======自定义metrics======
#
# Collected from sys.dm_exec_query_stats
#
# ======自定义metrics======
# 过去15s执行耗时超过1s的脚本数
- metric_name: mssql_long_elapsed_count
type: gauge
help: 'mssql_long_elapsed_count Number of scripts that took more than 1 second to execute in the past 15 seconds.'
values: [count]
query: |
SELECT
COUNT(1) [count]
FROM sys.dm_exec_query_stats
WHERE creation_time > DATEADD(S, -15, GETDATE())
AND (total_elapsed_time / execution_count)/(1000) > 1000
# ======自定义metrics======
queries:
# Populates `mssql_io_stall` and `mssql_io_stall_total`
- query_name: mssql_io_stall
query: |
SELECT
cast(DB_Name(a.database_id) as varchar) AS [database],
sum(io_stall_read_ms) AS [read],
sum(io_stall_write_ms) AS [write],
sum(io_stall) AS io_stall
FROM
sys.dm_io_virtual_file_stats(null, null) a
INNER JOIN sys.master_files b ON a.database_id = b.database_id AND a.file_id = b.file_id
GROUP BY a.database_id
# Populates `mssql_resident_memory_bytes`, `mssql_virtual_memory_bytes`, `mssql_memory_utilization_percentage` and
# `mssql_page_fault_count`.
- query_name: mssql_process_memory
query: |
SELECT
physical_memory_in_use_kb * 1024 AS resident_memory_bytes,
virtual_address_space_committed_kb * 1024 AS virtual_memory_bytes,
memory_utilization_percentage,
page_fault_count
FROM sys.dm_os_process_memory
- 编写
docker-compose-mssql-exporter.yml
部署 prometheus-mssql-exporter和sql-exporter
version: '3'
services:
TD105-prom-mssql-exporter:
image: awaragi/prometheus-mssql-exporter
container_name: TD105-prom-mssql-exporter
restart: always
ports:
- "14000:4000"
environment:
SERVER: '128.0.202.103'
USERNAME: 'sa'
PASSWORD: 'tdDev103@com'
#volumes:
# - /etc/hosts:/etc/hosts
networks:
- prometheus
xulq-prom-mssql-exporter:
image: awaragi/prometheus-mssql-exporter
container_name: xulq-prom-mssql-exporter
restart: always
ports:
- "14001:4000"
environment:
SERVER: '128.0.23.17'
USERNAME: 'sa'
PASSWORD: 'xulq123456'
#volumes:
# - /etc/hosts:/etc/hosts
networks:
- prometheus
#mssql-sql-exporter:
# image: centos
# container_name: mssql-sql-exporter
# restart: always
# ports:
# - "9399:9399"
# volumes:
# - /home/prometheus/docker/prometheus/sql_exporter:/root
# command: /root/sql_exporter -config.file /root/config/sql_exporter.yml
# networks:
# - prometheus
sql-exporter:
image: githubfree/sql_exporter
container_name: sql-exporter
restart: always
ports:
- "9399:9399"
volumes:
- /home/prometheus/docker/prometheus/sql_exporter:/root
command: --config.file /root/config/sql_exporter.yml
networks:
- prometheus
networks:
prometheus:
name: prometheus
从以上docker-compose文件可以看出,prom-mssql-exporter每监控一个实例就得开启一个容器,同时占用一个端口,sql-exporter可以合并多类型多实例数据库
- 启动容器
docker-compose -f docker-compose-mssql-exporter up -d
- 验证部署是否成功
prometheus-mssql-exporter的两个监控实例
sql-exporter的监控实例
- prometheus中添加metric
前文中prometheus.yml
中已添加prometheus-mssql-exporter、sql-exporter
查看prometheus中Targets状态
- 部署完后目录结构
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· go语言实现终端里的倒计时
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· 分享一个免费、快速、无限量使用的满血 DeepSeek R1 模型,支持深度思考和联网搜索!
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· ollama系列01:轻松3步本地部署deepseek,普通电脑可用
· 按钮权限的设计及实现
· 25岁的心里话