Redis监控之redis_exporter+prometheus+grafana+alertmanager
Redis监控之redis_exporter+prometheus+grafana+alertmanager
redis_exporter安装完后获取的数据太乱阅读太困难,需要配合prometheus和grafana。
操作系统是CentOS Linux 7。
不出意外需要账号密码的默认都是admin/admin
redis_exporter部署
下载地址:https://github.com/oliver006/redis_exporter/releases/tag/v1.24.0
另外的参考地址:
https://docs.gitlab.com/ee/administration/monitoring/prometheus/redis_exporter.html
https://github.com/oliver006/redis_exporter
下载的文件:redis_exporter-v1.24.0.linux-amd64.tar.gz
解压安装:
tar -zxvf redis_exporter-v1.24.0.linux-amd64.tar.gz -C / mv /redis_exporter-v1.24.0.linux-amd64/ /redis_exporter
启动redis_exporter
[root@node1 soft]# cd /redis_exporter/ [root@node1 redis_exporter]# ./redis_exporter -redis.addr 192.168.1.214:6380 -web.listen-address 192.168.1.178:9121 INFO[0000] Redis Metrics Exporter v1.24.0 build date: 2021-06-09-01:40:46 sha1: b95cf3b5ce7543119b303766662d1f0400caea94 Go: go1.16.5 GOOS: linux GOARCH: amd64 INFO[0000] Providing metrics at 192.168.1.178:9121/metrics ERRO[0015] Couldn't connect to redis instance
网上那些一次性写多个地址的方式并不可取,如
-redis.addr 192.168.1.214:6380,192.168.1.214:6379,192.168.1.214:6381
每次刷新都会报错ERRO[0001],如下
[root@node1 redis_exporter]# ./redis_exporter -redis.addr 192.168.1.214:6380,192.168.1.214:6379,192,168.1.214:6381 -web.listen-address 192.168.1.178:9121 INFO[0000] Redis Metrics Exporter v1.24.0 build date: 2021-06-09-01:40:46 sha1: b95cf3b5ce7543119b303766662d1f0400caea94 Go: go1.16.5 GOOS: linux GOARCH: amd64 INFO[0000] Providing metrics at 192.168.1.178:9121/metrics ERRO[0001] Couldn't connect to redis instance
这里就只写一个主节点的地址192.168.1.214:6380,网络资料说的是可以自动获取集群其他节点的信息,不过我这个是主从的目前看也是可以自动获取的。
访问192.168.1.178:9121/metrics可以看到获取的信息。
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 4.4411e-05 go_gc_duration_seconds{quantile="0.25"} 9.8068e-05 go_gc_duration_seconds{quantile="0.5"} 0.000130716 go_gc_duration_seconds{quantile="0.75"} 0.000174814 go_gc_duration_seconds{quantile="1"} 0.000622031 go_gc_duration_seconds_sum 0.047733795 go_gc_duration_seconds_count 326 # HELP go_goroutines Number of goroutines that currently exist. # TYPE go_goroutines gauge go_goroutines 10 # HELP go_info Information about the Go environment. # TYPE go_info gauge go_info{version="go1.16.5"} 1 # HELP go_memstats_alloc_bytes Number of bytes allocated and still in use. # TYPE go_memstats_alloc_bytes gauge go_memstats_alloc_bytes 3.17684e+06 # HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed. # TYPE go_memstats_alloc_bytes_total counter go_memstats_alloc_bytes_total 5.85939608e+08 # HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table. # TYPE go_memstats_buck_hash_sys_bytes gauge go_memstats_buck_hash_sys_bytes 1.499842e+06 # HELP go_memstats_frees_total Total number of frees. # TYPE go_memstats_frees_total counter go_memstats_frees_total 4.416845e+06 # HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started. # TYPE go_memstats_gc_cpu_fraction gauge go_memstats_gc_cpu_fraction 4.7848542653098556e-05 # HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata. # TYPE go_memstats_gc_sys_bytes gauge go_memstats_gc_sys_bytes 5.065448e+06 # HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use. # TYPE go_memstats_heap_alloc_bytes gauge go_memstats_heap_alloc_bytes 3.17684e+06 # HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used. # TYPE go_memstats_heap_idle_bytes gauge go_memstats_heap_idle_bytes 6.1833216e+07 # HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use. # TYPE go_memstats_heap_inuse_bytes gauge go_memstats_heap_inuse_bytes 4.620288e+06 # HELP go_memstats_heap_objects Number of allocated objects. # TYPE go_memstats_heap_objects gauge go_memstats_heap_objects 4394 # HELP go_memstats_heap_released_bytes Number of heap bytes released to OS. # TYPE go_memstats_heap_released_bytes gauge go_memstats_heap_released_bytes 6.1087744e+07 # HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system. # TYPE go_memstats_heap_sys_bytes gauge go_memstats_heap_sys_bytes 6.6453504e+07 # HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection. # TYPE go_memstats_last_gc_time_seconds gauge go_memstats_last_gc_time_seconds 1.62787137367609e+09 # HELP go_memstats_lookups_total Total number of pointer lookups. # TYPE go_memstats_lookups_total counter go_memstats_lookups_total 0 # HELP go_memstats_mallocs_total Total number of mallocs. # TYPE go_memstats_mallocs_total counter go_memstats_mallocs_total 4.421239e+06 # HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures. # TYPE go_memstats_mcache_inuse_bytes gauge go_memstats_mcache_inuse_bytes 4800 # HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system. # TYPE go_memstats_mcache_sys_bytes gauge go_memstats_mcache_sys_bytes 16384 # HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures. # TYPE go_memstats_mspan_inuse_bytes gauge go_memstats_mspan_inuse_bytes 78744 # HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system. # TYPE go_memstats_mspan_sys_bytes gauge go_memstats_mspan_sys_bytes 114688 # HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place. # TYPE go_memstats_next_gc_bytes gauge go_memstats_next_gc_bytes 6.200176e+06 # HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations. # TYPE go_memstats_other_sys_bytes gauge go_memstats_other_sys_bytes 988766 # HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator. # TYPE go_memstats_stack_inuse_bytes gauge go_memstats_stack_inuse_bytes 655360 # HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator. # TYPE go_memstats_stack_sys_bytes gauge go_memstats_stack_sys_bytes 655360 # HELP go_memstats_sys_bytes Number of bytes obtained from system. # TYPE go_memstats_sys_bytes gauge go_memstats_sys_bytes 7.4793992e+07 # HELP go_threads Number of OS threads created. # TYPE go_threads gauge go_threads 7 # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 18.19 # HELP process_max_fds Maximum number of open file descriptors. # TYPE process_max_fds gauge process_max_fds 1024 # HELP process_open_fds Number of open file descriptors. # TYPE process_open_fds gauge process_open_fds 13 # HELP process_resident_memory_bytes Resident memory size in bytes. # TYPE process_resident_memory_bytes gauge process_resident_memory_bytes 1.1882496e+07 # HELP process_start_time_seconds Start time of the process since unix epoch in seconds. # TYPE process_start_time_seconds gauge process_start_time_seconds 1.62786724615e+09 # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 7.30558464e+08 # HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes. # TYPE process_virtual_memory_max_bytes gauge process_virtual_memory_max_bytes 1.8446744073709552e+19 # HELP redis_active_defrag_running active_defrag_running metric # TYPE redis_active_defrag_running gauge redis_active_defrag_running 0 # HELP redis_aof_current_rewrite_duration_sec aof_current_rewrite_duration_sec metric # TYPE redis_aof_current_rewrite_duration_sec gauge redis_aof_current_rewrite_duration_sec -1 # HELP redis_aof_enabled aof_enabled metric # TYPE redis_aof_enabled gauge redis_aof_enabled 0 # HELP redis_aof_last_bgrewrite_status aof_last_bgrewrite_status metric # TYPE redis_aof_last_bgrewrite_status gauge redis_aof_last_bgrewrite_status 1 # HELP redis_aof_last_cow_size_bytes aof_last_cow_size_bytes metric # TYPE redis_aof_last_cow_size_bytes gauge redis_aof_last_cow_size_bytes 0 # HELP redis_aof_last_rewrite_duration_sec aof_last_rewrite_duration_sec metric # TYPE redis_aof_last_rewrite_duration_sec gauge redis_aof_last_rewrite_duration_sec -1 # HELP redis_aof_last_write_status aof_last_write_status metric # TYPE redis_aof_last_write_status gauge redis_aof_last_write_status 1 # HELP redis_aof_rewrite_in_progress aof_rewrite_in_progress metric # TYPE redis_aof_rewrite_in_progress gauge redis_aof_rewrite_in_progress 0 # HELP redis_aof_rewrite_scheduled aof_rewrite_scheduled metric # TYPE redis_aof_rewrite_scheduled gauge redis_aof_rewrite_scheduled 0 # HELP redis_blocked_clients blocked_clients metric # TYPE redis_blocked_clients gauge redis_blocked_clients 0 # HELP redis_client_biggest_input_buf client_biggest_input_buf metric # TYPE redis_client_biggest_input_buf gauge redis_client_biggest_input_buf 0 # HELP redis_client_longest_output_list client_longest_output_list metric # TYPE redis_client_longest_output_list gauge redis_client_longest_output_list 0 # HELP redis_cluster_enabled cluster_enabled metric # TYPE redis_cluster_enabled gauge redis_cluster_enabled 0 # HELP redis_commands_duration_seconds_total Total amount of time in seconds spent per command # TYPE redis_commands_duration_seconds_total counter redis_commands_duration_seconds_total{cmd="auth"} 1.6e-05 redis_commands_duration_seconds_total{cmd="client"} 0.119519 redis_commands_duration_seconds_total{cmd="command"} 0.000553 redis_commands_duration_seconds_total{cmd="config"} 0.560761 redis_commands_duration_seconds_total{cmd="del"} 40.078852 redis_commands_duration_seconds_total{cmd="eval"} 0.000648 redis_commands_duration_seconds_total{cmd="evalsha"} 52.593835 redis_commands_duration_seconds_total{cmd="exists"} 0.002163 redis_commands_duration_seconds_total{cmd="expire"} 4.639735 redis_commands_duration_seconds_total{cmd="get"} 39.35076 redis_commands_duration_seconds_total{cmd="hdel"} 0.032488 redis_commands_duration_seconds_total{cmd="hget"} 4.143723 redis_commands_duration_seconds_total{cmd="hgetall"} 52.309559 redis_commands_duration_seconds_total{cmd="hincrby"} 6.253747 redis_commands_duration_seconds_total{cmd="hlen"} 0.000279 redis_commands_duration_seconds_total{cmd="hmset"} 97.246473 redis_commands_duration_seconds_total{cmd="host"} 0.002547 redis_commands_duration_seconds_total{cmd="hscan"} 0.027941 redis_commands_duration_seconds_total{cmd="hset"} 0.111718 redis_commands_duration_seconds_total{cmd="incr"} 0.081717 redis_commands_duration_seconds_total{cmd="incrby"} 0.790273 redis_commands_duration_seconds_total{cmd="info"} 472.399096 redis_commands_duration_seconds_total{cmd="keys"} 0.011277 redis_commands_duration_seconds_total{cmd="latency"} 0.011697 redis_commands_duration_seconds_total{cmd="lindex"} 0.003309 redis_commands_duration_seconds_total{cmd="llen"} 0.000243 redis_commands_duration_seconds_total{cmd="lrange"} 0.714049 redis_commands_duration_seconds_total{cmd="lrem"} 0.002257 redis_commands_duration_seconds_total{cmd="ltrim"} 0.081033 redis_commands_duration_seconds_total{cmd="pexpire"} 0.053587 redis_commands_duration_seconds_total{cmd="ping"} 33.619505 redis_commands_duration_seconds_total{cmd="psync"} 0.010975 redis_commands_duration_seconds_total{cmd="publish"} 47.437203 redis_commands_duration_seconds_total{cmd="replconf"} 24.135835 redis_commands_duration_seconds_total{cmd="rpush"} 0.724147 redis_commands_duration_seconds_total{cmd="sadd"} 9.122367 redis_commands_duration_seconds_total{cmd="scan"} 183.549755 redis_commands_duration_seconds_total{cmd="scard"} 1.271612 redis_commands_duration_seconds_total{cmd="select"} 12.112273 redis_commands_duration_seconds_total{cmd="set"} 59.943641 redis_commands_duration_seconds_total{cmd="setex"} 0.390939 redis_commands_duration_seconds_total{cmd="setnx"} 5.509553 redis_commands_duration_seconds_total{cmd="slowlog"} 0.062131 redis_commands_duration_seconds_total{cmd="smembers"} 0.108663 redis_commands_duration_seconds_total{cmd="spop"} 0.6798 redis_commands_duration_seconds_total{cmd="srem"} 0.014079 redis_commands_duration_seconds_total{cmd="sscan"} 0.002472 redis_commands_duration_seconds_total{cmd="subscribe"} 1.2e-05 redis_commands_duration_seconds_total{cmd="ttl"} 0.002117 redis_commands_duration_seconds_total{cmd="type"} 0.003339 redis_commands_duration_seconds_total{cmd="unlink"} 0.020745 # HELP redis_commands_processed_total commands_processed_total metric # TYPE redis_commands_processed_total counter redis_commands_processed_total 1.27407536e+08 # HELP redis_commands_total Total number of calls per command # TYPE redis_commands_total counter redis_commands_total{cmd="auth"} 9 redis_commands_total{cmd="client"} 79475 redis_commands_total{cmd="command"} 1 redis_commands_total{cmd="config"} 4578 redis_commands_total{cmd="del"} 137331 redis_commands_total{cmd="eval"} 3 redis_commands_total{cmd="evalsha"} 1.528261e+06 redis_commands_total{cmd="exists"} 622 redis_commands_total{cmd="expire"} 2.031993e+06 redis_commands_total{cmd="get"} 1.195089e+07 redis_commands_total{cmd="hdel"} 3209 redis_commands_total{cmd="hget"} 998016 redis_commands_total{cmd="hgetall"} 5.695487e+06 redis_commands_total{cmd="hincrby"} 654030 redis_commands_total{cmd="hlen"} 76 redis_commands_total{cmd="hmset"} 6.570541e+06 redis_commands_total{cmd="host"} 52 redis_commands_total{cmd="hscan"} 76 redis_commands_total{cmd="hset"} 6202 redis_commands_total{cmd="incr"} 7435 redis_commands_total{cmd="incrby"} 121021 redis_commands_total{cmd="info"} 3.791154e+06 redis_commands_total{cmd="keys"} 78 redis_commands_total{cmd="latency"} 4444 redis_commands_total{cmd="lindex"} 46 redis_commands_total{cmd="llen"} 52 redis_commands_total{cmd="lrange"} 170093 redis_commands_total{cmd="lrem"} 46 redis_commands_total{cmd="ltrim"} 3808 redis_commands_total{cmd="pexpire"} 13934 redis_commands_total{cmd="ping"} 2.7573152e+07 redis_commands_total{cmd="psync"} 4 redis_commands_total{cmd="publish"} 7.048611e+06 redis_commands_total{cmd="replconf"} 1.4497687e+07 redis_commands_total{cmd="rpush"} 10005 redis_commands_total{cmd="sadd"} 559362 redis_commands_total{cmd="scan"} 1.2812383e+07 redis_commands_total{cmd="scard"} 258338 redis_commands_total{cmd="select"} 1.0435721e+07 redis_commands_total{cmd="set"} 1.8583699e+07 redis_commands_total{cmd="setex"} 42367 redis_commands_total{cmd="setnx"} 1.535913e+06 redis_commands_total{cmd="slowlog"} 8888 redis_commands_total{cmd="smembers"} 22600 redis_commands_total{cmd="spop"} 236576 redis_commands_total{cmd="srem"} 1752 redis_commands_total{cmd="sscan"} 33 redis_commands_total{cmd="subscribe"} 2 redis_commands_total{cmd="ttl"} 670 redis_commands_total{cmd="type"} 677 redis_commands_total{cmd="unlink"} 6133 # HELP redis_config_maxclients config_maxclients metric # TYPE redis_config_maxclients gauge redis_config_maxclients 10000 # HELP redis_config_maxmemory config_maxmemory metric # TYPE redis_config_maxmemory gauge redis_config_maxmemory 0 # HELP redis_connected_clients connected_clients metric # TYPE redis_connected_clients gauge redis_connected_clients 86 # HELP redis_connected_slave_lag_seconds Lag of connected slave # TYPE redis_connected_slave_lag_seconds gauge redis_connected_slave_lag_seconds{slave_ip="192.168.1.214",slave_port="6379",slave_state="online"} 1 redis_connected_slave_lag_seconds{slave_ip="192.168.1.214",slave_port="6381",slave_state="online"} 1 # HELP redis_connected_slave_offset_bytes Offset of connected slave # TYPE redis_connected_slave_offset_bytes gauge redis_connected_slave_offset_bytes{slave_ip="192.168.1.214",slave_port="6379",slave_state="online"} 2.1943761833e+10 redis_connected_slave_offset_bytes{slave_ip="192.168.1.214",slave_port="6381",slave_state="online"} 2.1943761833e+10 # HELP redis_connected_slaves connected_slaves metric # TYPE redis_connected_slaves gauge redis_connected_slaves 2 # HELP redis_connections_received_total connections_received_total metric # TYPE redis_connections_received_total counter redis_connections_received_total 4.7644e+06 # HELP redis_cpu_sys_children_seconds_total cpu_sys_children_seconds_total metric # TYPE redis_cpu_sys_children_seconds_total counter redis_cpu_sys_children_seconds_total 1195.64 # HELP redis_cpu_sys_seconds_total cpu_sys_seconds_total metric # TYPE redis_cpu_sys_seconds_total counter redis_cpu_sys_seconds_total 12650.77 # HELP redis_cpu_user_children_seconds_total cpu_user_children_seconds_total metric # TYPE redis_cpu_user_children_seconds_total counter redis_cpu_user_children_seconds_total 8929.86 # HELP redis_cpu_user_seconds_total cpu_user_seconds_total metric # TYPE redis_cpu_user_seconds_total counter redis_cpu_user_seconds_total 8919.24 # HELP redis_db_avg_ttl_seconds Avg TTL in seconds # TYPE redis_db_avg_ttl_seconds gauge redis_db_avg_ttl_seconds{db="db11"} 1825.3 redis_db_avg_ttl_seconds{db="db12"} 71020.336 redis_db_avg_ttl_seconds{db="db13"} 84212.367 redis_db_avg_ttl_seconds{db="db14"} 36.304 redis_db_avg_ttl_seconds{db="db15"} 0 redis_db_avg_ttl_seconds{db="db4"} 2306.138 redis_db_avg_ttl_seconds{db="db5"} 0 redis_db_avg_ttl_seconds{db="db6"} 0 redis_db_avg_ttl_seconds{db="db7"} 1.422106525e+06 redis_db_avg_ttl_seconds{db="db9"} 82129.002 # HELP redis_db_keys Total number of keys by DB # TYPE redis_db_keys gauge redis_db_keys{db="db0"} 0 redis_db_keys{db="db1"} 0 redis_db_keys{db="db10"} 0 redis_db_keys{db="db11"} 102 redis_db_keys{db="db12"} 83 redis_db_keys{db="db13"} 56 redis_db_keys{db="db14"} 232 redis_db_keys{db="db15"} 3 redis_db_keys{db="db16"} 0 redis_db_keys{db="db17"} 0 redis_db_keys{db="db18"} 0 redis_db_keys{db="db19"} 0 redis_db_keys{db="db2"} 0 redis_db_keys{db="db3"} 0 redis_db_keys{db="db4"} 8 redis_db_keys{db="db5"} 3 redis_db_keys{db="db6"} 6 redis_db_keys{db="db7"} 998 redis_db_keys{db="db8"} 0 redis_db_keys{db="db9"} 24 # HELP redis_db_keys_expiring Total number of expiring keys by DB # TYPE redis_db_keys_expiring gauge redis_db_keys_expiring{db="db0"} 0 redis_db_keys_expiring{db="db1"} 0 redis_db_keys_expiring{db="db10"} 0 redis_db_keys_expiring{db="db11"} 1 redis_db_keys_expiring{db="db12"} 15 redis_db_keys_expiring{db="db13"} 2 redis_db_keys_expiring{db="db14"} 2 redis_db_keys_expiring{db="db15"} 0 redis_db_keys_expiring{db="db16"} 0 redis_db_keys_expiring{db="db17"} 0 redis_db_keys_expiring{db="db18"} 0 redis_db_keys_expiring{db="db19"} 0 redis_db_keys_expiring{db="db2"} 0 redis_db_keys_expiring{db="db3"} 0 redis_db_keys_expiring{db="db4"} 8 redis_db_keys_expiring{db="db5"} 0 redis_db_keys_expiring{db="db6"} 0 redis_db_keys_expiring{db="db7"} 960 redis_db_keys_expiring{db="db8"} 0 redis_db_keys_expiring{db="db9"} 3 # HELP redis_defrag_hits defrag_hits metric # TYPE redis_defrag_hits gauge redis_defrag_hits 0 # HELP redis_defrag_key_hits defrag_key_hits metric # TYPE redis_defrag_key_hits gauge redis_defrag_key_hits 0 # HELP redis_defrag_key_misses defrag_key_misses metric # TYPE redis_defrag_key_misses gauge redis_defrag_key_misses 0 # HELP redis_defrag_misses defrag_misses metric # TYPE redis_defrag_misses gauge redis_defrag_misses 0 # HELP redis_evicted_keys_total evicted_keys_total metric # TYPE redis_evicted_keys_total counter redis_evicted_keys_total 0 # HELP redis_expired_keys_total expired_keys_total metric # TYPE redis_expired_keys_total counter redis_expired_keys_total 42862 # HELP redis_exporter_build_info redis exporter build_info # TYPE redis_exporter_build_info gauge redis_exporter_build_info{build_date="2021-06-09-01:40:46",commit_sha="b95cf3b5ce7543119b303766662d1f0400caea94",golang_version="go1.16.5",version="v1.24.0"} 1 # HELP redis_exporter_last_scrape_connect_time_seconds exporter_last_scrape_connect_time_seconds metric # TYPE redis_exporter_last_scrape_connect_time_seconds gauge redis_exporter_last_scrape_connect_time_seconds 0.000938134 # HELP redis_exporter_last_scrape_duration_seconds exporter_last_scrape_duration_seconds metric # TYPE redis_exporter_last_scrape_duration_seconds gauge redis_exporter_last_scrape_duration_seconds 0.00479455 # HELP redis_exporter_last_scrape_error The last scrape error status. # TYPE redis_exporter_last_scrape_error gauge redis_exporter_last_scrape_error{err=""} 0 # HELP redis_exporter_scrape_duration_seconds Durations of scrapes by the exporter # TYPE redis_exporter_scrape_duration_seconds summary redis_exporter_scrape_duration_seconds_sum 1.1995302149999998 redis_exporter_scrape_duration_seconds_count 237 # HELP redis_exporter_scrapes_total Current total redis scrapes. # TYPE redis_exporter_scrapes_total counter redis_exporter_scrapes_total 237 # HELP redis_instance_info Information about the Redis instance # TYPE redis_instance_info gauge redis_instance_info{maxmemory_policy="noeviction",os="Linux 3.10.0-957.el7.x86_64 x86_64",process_id="5428",redis_build_id="2d12e85652dc7ce9",redis_mode="standalone",redis_version="4.0.2",role="master",run_id="3f70dd786f2534fae677062ac371f87fd78fe914",tcp_port="6380"} 1 # HELP redis_keyspace_hits_total keyspace_hits_total metric # TYPE redis_keyspace_hits_total counter redis_keyspace_hits_total 9.793446e+06 # HELP redis_keyspace_misses_total keyspace_misses_total metric # TYPE redis_keyspace_misses_total counter redis_keyspace_misses_total 9.303561e+06 # HELP redis_last_key_groups_scrape_duration_milliseconds Duration of the last key group metrics scrape in milliseconds # TYPE redis_last_key_groups_scrape_duration_milliseconds gauge redis_last_key_groups_scrape_duration_milliseconds 0 # HELP redis_last_slow_execution_duration_seconds The amount of time needed for last slow execution, in seconds # TYPE redis_last_slow_execution_duration_seconds gauge redis_last_slow_execution_duration_seconds 0.059945 # HELP redis_latest_fork_seconds latest_fork_seconds metric # TYPE redis_latest_fork_seconds gauge redis_latest_fork_seconds 0.006136 # HELP redis_lazyfree_pending_objects lazyfree_pending_objects metric # TYPE redis_lazyfree_pending_objects gauge redis_lazyfree_pending_objects 0 # HELP redis_loading_dump_file loading_dump_file metric # TYPE redis_loading_dump_file gauge redis_loading_dump_file 0 # HELP redis_master_repl_offset master_repl_offset metric # TYPE redis_master_repl_offset gauge redis_master_repl_offset 2.1943761833e+10 # HELP redis_mem_fragmentation_ratio mem_fragmentation_ratio metric # TYPE redis_mem_fragmentation_ratio gauge redis_mem_fragmentation_ratio 1.12 # HELP redis_memory_max_bytes memory_max_bytes metric # TYPE redis_memory_max_bytes gauge redis_memory_max_bytes 0 # HELP redis_memory_used_bytes memory_used_bytes metric # TYPE redis_memory_used_bytes gauge redis_memory_used_bytes 1.38829216e+08 # HELP redis_memory_used_dataset_bytes memory_used_dataset_bytes metric # TYPE redis_memory_used_dataset_bytes gauge redis_memory_used_dataset_bytes 1.35237404e+08 # HELP redis_memory_used_lua_bytes memory_used_lua_bytes metric # TYPE redis_memory_used_lua_bytes gauge redis_memory_used_lua_bytes 37888 # HELP redis_memory_used_overhead_bytes memory_used_overhead_bytes metric # TYPE redis_memory_used_overhead_bytes gauge redis_memory_used_overhead_bytes 3.591812e+06 # HELP redis_memory_used_peak_bytes memory_used_peak_bytes metric # TYPE redis_memory_used_peak_bytes gauge redis_memory_used_peak_bytes 1.3938588e+08 # HELP redis_memory_used_rss_bytes memory_used_rss_bytes metric # TYPE redis_memory_used_rss_bytes gauge redis_memory_used_rss_bytes 1.55652096e+08 # HELP redis_memory_used_startup_bytes memory_used_startup_bytes metric # TYPE redis_memory_used_startup_bytes gauge redis_memory_used_startup_bytes 767968 # HELP redis_migrate_cached_sockets_total migrate_cached_sockets_total metric # TYPE redis_migrate_cached_sockets_total gauge redis_migrate_cached_sockets_total 0 # HELP redis_net_input_bytes_total net_input_bytes_total metric # TYPE redis_net_input_bytes_total counter redis_net_input_bytes_total 2.9809647461e+10 # HELP redis_net_output_bytes_total net_output_bytes_total metric # TYPE redis_net_output_bytes_total counter redis_net_output_bytes_total 7.3329383597e+10 # HELP redis_process_id process_id metric # TYPE redis_process_id gauge redis_process_id 5428 # HELP redis_pubsub_channels pubsub_channels metric # TYPE redis_pubsub_channels gauge redis_pubsub_channels 1 # HELP redis_pubsub_patterns pubsub_patterns metric # TYPE redis_pubsub_patterns gauge redis_pubsub_patterns 0 # HELP redis_rdb_bgsave_in_progress rdb_bgsave_in_progress metric # TYPE redis_rdb_bgsave_in_progress gauge redis_rdb_bgsave_in_progress 0 # HELP redis_rdb_changes_since_last_save rdb_changes_since_last_save metric # TYPE redis_rdb_changes_since_last_save gauge redis_rdb_changes_since_last_save 1670 # HELP redis_rdb_current_bgsave_duration_sec rdb_current_bgsave_duration_sec metric # TYPE redis_rdb_current_bgsave_duration_sec gauge redis_rdb_current_bgsave_duration_sec -1 # HELP redis_rdb_last_bgsave_duration_sec rdb_last_bgsave_duration_sec metric # TYPE redis_rdb_last_bgsave_duration_sec gauge redis_rdb_last_bgsave_duration_sec 0 # HELP redis_rdb_last_bgsave_status rdb_last_bgsave_status metric # TYPE redis_rdb_last_bgsave_status gauge redis_rdb_last_bgsave_status 1 # HELP redis_rdb_last_cow_size_bytes rdb_last_cow_size_bytes metric # TYPE redis_rdb_last_cow_size_bytes gauge redis_rdb_last_cow_size_bytes 3.2497664e+07 # HELP redis_rdb_last_save_timestamp_seconds rdb_last_save_timestamp_seconds metric # TYPE redis_rdb_last_save_timestamp_seconds gauge redis_rdb_last_save_timestamp_seconds 1.627871113e+09 # HELP redis_rejected_connections_total rejected_connections_total metric # TYPE redis_rejected_connections_total counter redis_rejected_connections_total 0 # HELP redis_repl_backlog_first_byte_offset repl_backlog_first_byte_offset metric # TYPE redis_repl_backlog_first_byte_offset gauge redis_repl_backlog_first_byte_offset 2.1942713258e+10 # HELP redis_repl_backlog_history_bytes repl_backlog_history_bytes metric # TYPE redis_repl_backlog_history_bytes gauge redis_repl_backlog_history_bytes 1.048576e+06 # HELP redis_repl_backlog_is_active repl_backlog_is_active metric # TYPE redis_repl_backlog_is_active gauge redis_repl_backlog_is_active 1 # HELP redis_replica_partial_resync_accepted replica_partial_resync_accepted metric # TYPE redis_replica_partial_resync_accepted gauge redis_replica_partial_resync_accepted 2 # HELP redis_replica_partial_resync_denied replica_partial_resync_denied metric # TYPE redis_replica_partial_resync_denied gauge redis_replica_partial_resync_denied 1 # HELP redis_replica_resyncs_full replica_resyncs_full metric # TYPE redis_replica_resyncs_full gauge redis_replica_resyncs_full 2 # HELP redis_replication_backlog_bytes replication_backlog_bytes metric # TYPE redis_replication_backlog_bytes gauge redis_replication_backlog_bytes 1.048576e+06 # HELP redis_second_repl_offset second_repl_offset metric # TYPE redis_second_repl_offset gauge redis_second_repl_offset -1 # HELP redis_slave_expires_tracked_keys slave_expires_tracked_keys metric # TYPE redis_slave_expires_tracked_keys gauge redis_slave_expires_tracked_keys 0 # HELP redis_slowlog_last_id Last id of slowlog # TYPE redis_slowlog_last_id gauge redis_slowlog_last_id 12 # HELP redis_slowlog_length Total slowlog # TYPE redis_slowlog_length gauge redis_slowlog_length 13 # HELP redis_start_time_seconds Start time of the Redis instance since unix epoch in seconds. # TYPE redis_start_time_seconds gauge redis_start_time_seconds 1.620606909e+09 # HELP redis_target_scrape_request_errors_total Errors in requests to the exporter # TYPE redis_target_scrape_request_errors_total counter redis_target_scrape_request_errors_total 0 # HELP redis_up Information about the Redis instance # TYPE redis_up gauge redis_up 1 # HELP redis_uptime_in_seconds uptime_in_seconds metric # TYPE redis_uptime_in_seconds gauge redis_uptime_in_seconds 7.264465e+06
这样redis_exporter也就部署完成了。
设置开机自启动并启动redis_exporter。
cat <<\EOF >/etc/systemd/system/redis_exporter.service [Unit] Description=Prometheus exporter for Redis metrics. [Service] ExecStart=/redis_exporter/redis_exporter -redis.addr 192.168.1.214:6380 -web.listen-address 192.168.1.178:9121 Restart=on-failure [Install] WantedBy=multi-user.target EOF
更新配置(记得停止前边手工启动的会话)
systemctl daemon-reload
systemctl enable redis_exporter.service
systemctl restart redis_exporter.service
systemctl status redis_exporter.service
prometheus部署
下载地址:https://github.com/prometheus/prometheus/releases/
下载的文件:prometheus-2.28.1.linux-amd64.tar.gz
解压即安装:
[root@node1 soft]# tar -zxvf prometheus-2.28.1.linux-amd64.tar.gz [root@node1 soft]# mv prometheus-2.28.1.linux-amd64 /prometheus [root@node1 soft]# cd /prometheus/
添加配置
[root@node1 prometheus]# vi /prometheus/prometheus.yml
添加: - job_name: 'redis_exporter_targets' static_configs: - targets: - redis://192.168.1.214:6380 - redis://192.168.1.214:6379 - redis://192.168.1.214:6381 metrics_path: /scrape relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 192.168.1.178:9121 ## config for scraping the exporter itself - job_name: 'redis_exporter' static_configs: - targets: - 192.168.1.178:9121
启动prometheus
[root@node1 prometheus]# ./prometheus level=info ts=2021-08-02T02:40:06.001Z caller=main.go:389 msg="No time or size retention was set so using the default time retention" duration=15d level=info ts=2021-08-02T02:40:06.002Z caller=main.go:443 msg="Starting Prometheus" version="(version=2.28.1, branch=HEAD, revision=b0944590a1c9a6b35dc5a696869f75f422b107a1)" level=info ts=2021-08-02T02:40:06.002Z caller=main.go:448 build_context="(go=go1.16.5, user=root@2915dd495090, date=20210701-15:20:10)" level=info ts=2021-08-02T02:40:06.002Z caller=main.go:449 host_details="(Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 node1 (none))" level=info ts=2021-08-02T02:40:06.002Z caller=main.go:450 fd_limits="(soft=1024, hard=4096)" level=info ts=2021-08-02T02:40:06.003Z caller=main.go:451 vm_limits="(soft=unlimited, hard=unlimited)" level=info ts=2021-08-02T02:40:06.012Z caller=web.go:541 component=web msg="Start listening for connections" address=0.0.0.0:9090 level=info ts=2021-08-02T02:40:06.013Z caller=main.go:824 msg="Starting TSDB ..." level=info ts=2021-08-02T02:40:06.015Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1627581602588 maxt=1627588800000 ulid=01FBT12MPWQ0F1HNJTMBJRKVZ4 level=info ts=2021-08-02T02:40:06.015Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1627588802588 maxt=1627596000000 ulid=01FBT7YBYX56PYJTSCNPDGNF8S level=info ts=2021-08-02T02:40:06.015Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1627546183899 maxt=1627581600000 ulid=01FBT7YCAJW6HK1ZWQT3GAXSHM level=info ts=2021-08-02T02:40:06.015Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1627596000000 maxt=1627603200000 ulid=01FC279X74Q6P1KKRSK6FSE4ZE level=info ts=2021-08-02T02:40:06.015Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1627603202588 maxt=1627610400000 ulid=01FC279X9GBYG0VD4FS6MP8V0E level=info ts=2021-08-02T02:40:06.017Z caller=tls_config.go:191 component=web msg="TLS is disabled." http2=false level=info ts=2021-08-02T02:40:06.032Z caller=head.go:780 component=tsdb msg="Replaying on-disk memory mappable chunks if any" level=info ts=2021-08-02T02:40:06.035Z caller=head.go:794 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=2.833274ms level=info ts=2021-08-02T02:40:06.035Z caller=head.go:800 component=tsdb msg="Replaying WAL, this may take a while" level=warn ts=2021-08-02T02:40:06.098Z caller=head.go:767 component=tsdb msg="Unknown series references" samples=15293 exemplars=0 level=info ts=2021-08-02T02:40:06.098Z caller=head.go:826 component=tsdb msg="WAL checkpoint loaded" level=info ts=2021-08-02T02:40:06.116Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=31 maxSegment=34 level=info ts=2021-08-02T02:40:06.117Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=32 maxSegment=34 level=info ts=2021-08-02T02:40:06.131Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=33 maxSegment=34 level=info ts=2021-08-02T02:40:06.131Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=34 maxSegment=34 level=info ts=2021-08-02T02:40:06.131Z caller=head.go:860 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=62.785484ms wal_replay_duration=33.30167ms total_replay_duration=98.993811ms level=info ts=2021-08-02T02:40:06.140Z caller=main.go:851 fs_type=XFS_SUPER_MAGIC level=info ts=2021-08-02T02:40:06.140Z caller=main.go:854 msg="TSDB started" level=info ts=2021-08-02T02:40:06.140Z caller=main.go:981 msg="Loading configuration file" filename=prometheus.yml level=info ts=2021-08-02T02:40:06.150Z caller=main.go:1012 msg="Completed loading of configuration file" filename=prometheus.yml totalDuration=9.905156ms remote_storage=12.884µs web_handler=860ns query_engine=7.018µs scrape=1.041656ms scrape_sd=149.496µs notify=76.325µs notify_sd=43.197µs rules=7.250541ms level=info ts=2021-08-02T02:40:06.150Z caller=main.go:796 msg="Server is ready to receive web requests." level=info ts=2021-08-02T02:40:13.942Z caller=compact.go:509 component=tsdb msg="write block resulted in empty block" mint=1627610400000 maxt=1627617600000 duration=23.036437ms level=info ts=2021-08-02T02:40:13.946Z caller=head.go:967 component=tsdb msg="Head GC completed" duration=3.883036ms level=info ts=2021-08-02T02:40:13.950Z caller=checkpoint.go:97 component=tsdb msg="Creating checkpoint" from_segment=31 to_segment=32 mint=1627617600000 level=info ts=2021-08-02T02:40:14.059Z caller=head.go:1064 component=tsdb msg="WAL checkpoint complete" first=31 last=32 duration=109.568447ms
访问192.168.1.178:9090可以看到获取的信息。
加入开机启动服务
vim /etc/systemd/system/prometheus.service [Unit] Description=Prometheus Monitoring System [Service] ExecStart=/prometheus/prometheus \ --config.file=/prometheus/prometheus.yml \ --web.listen-address=:9090 Restart=on-failure [Install] WantedBy=multi-user.target
停止前边前台方式的启动方法./prometheus。
启动服务,设置开机自启,并检查服务开启状态。
systemctl daemon-reload systemctl enable prometheus systemctl start prometheus systemctl status prometheus [root@node1 prometheus]# cat /etc/systemd/system/prometheus.service [Unit] Description=Prometheus Monitoring System [Service] ExecStart=/prometheus/prometheus \ --config.file=/prometheus/prometheus.yml \ --web.listen-address=:9090 Restart=on-failure [Install] [root@node1 prometheus]# systemctl status prometheus ● prometheus.service - Prometheus Monitoring System Loaded: loaded (/etc/systemd/system/prometheus.service; static; vendor preset: disabled) Active: active (running) since Mon 2021-08-02 11:23:31 CST; 3min 26s ago Main PID: 30494 (prometheus) CGroup: /system.slice/prometheus.service └─30494 /prometheus/prometheus --config.file=/prometheus/prometheus.yml --web.listen-address=:9090 Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.664Z caller=head.go:780 component=tsdb msg="Replaying on-disk memory mappable chunks if any" Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.664Z caller=head.go:794 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=18.782µs Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.664Z caller=head.go:800 component=tsdb msg="Replaying WAL, this may take a while" Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.665Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0 Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.665Z caller=head.go:860 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=67.616µs wal_replay_…ration=815.018µs Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.668Z caller=main.go:851 fs_type=XFS_SUPER_MAGIC Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.668Z caller=main.go:854 msg="TSDB started" Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.668Z caller=main.go:981 msg="Loading configuration file" filename=/prometheus/prometheus.yml Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.677Z caller=main.go:1012 msg="Completed loading of configuration file" filename=/prometheus/prometheus.yml totalDuration=8.9289…ms Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.677Z caller=main.go:796 msg="Server is ready to receive web requests." Hint: Some lines were ellipsized, use -l to show in full.
关于报警功能的实现,需要部署alertmanager来配合实现。
至此,prometheus也部署完成。
alertmanager部署
下载文件:alertmanager-0.22.2.linux-amd64.tar.gz
解压安装:
[root@node1 soft]# tar -zxvf alertmanager-0.22.2.linux-amd64.tar.gz -C / alertmanager-0.22.2.linux-amd64/ alertmanager-0.22.2.linux-amd64/alertmanager.yml alertmanager-0.22.2.linux-amd64/LICENSE alertmanager-0.22.2.linux-amd64/NOTICE alertmanager-0.22.2.linux-amd64/alertmanager alertmanager-0.22.2.linux-amd64/amtool [root@node1 soft]# mv /alertmanager-0.22.2.linux-amd64/ /alertmanager [root@node1 soft]# cd /alertmanager/ [root@node1 alertmanager]# ll total 47788 -rwxr-xr-x 1 3434 3434 27074026 Jun 2 15:51 alertmanager -rw-r--r-- 1 3434 3434 348 Jun 2 15:56 alertmanager.yml -rwxr-xr-x 1 3434 3434 21839682 Jun 2 15:52 amtool -rw-r--r-- 1 3434 3434 11357 Jun 2 15:56 LICENSE -rw-r--r-- 1 3434 3434 457 Jun 2 15:56 NOTICE
配置邮件发送信息,也有其他的如钉钉的,这里以邮件为例子。
注意:smtp_smarthost不同邮箱是不一样的。
vi /alertmanager/alertmanager.yml global: resolve_timeout: 5m smtp_smarthost: 'smtp.exmail.qq.com:465' smtp_from: 'zhaokm@xxxxxxx.xx' smtp_auth_username: 'zhaokm@xxxxxxx.xx' smtp_auth_password: '邮箱密码' smtp_require_tls: false route: group_by: ['alertname'] group_wait: 5s group_interval: 5s repeat_interval: 5m receiver: 'email' receivers: - name: 'email' email_configs: - to: 'zhaokm@xxxxxxx.xx' send_resolved: true inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance']
配置开机启动
cat > /etc/systemd/system/alertmanager.service << "EOF" [Unit] Description=alertmanager After=local-fs.target network-online.target network.target Wants=local-fs.target network-online.target network.target [Service] ExecStart=/alertmanager/alertmanager --config.file=/alertmanager/alertmanager.yml Restart=on-failure [Install] WantedBy=multi-user.target EOF
生效配置
[root@node1 alertmanager]# systemctl daemon-reload [root@node1 alertmanager]# systemctl enable alertmanager Created symlink from /etc/systemd/system/multi-user.target.wants/alertmanager.service to /etc/systemd/system/alertmanager.service. [root@node1 alertmanager]# systemctl start alertmanager [root@node1 alertmanager]# systemctl status alertmanager ● alertmanager.service - alertmanager Loaded: loaded (/etc/systemd/system/alertmanager.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2021-08-02 15:14:58 CST; 3s ago Main PID: 9825 (alertmanager) CGroup: /system.slice/alertmanager.service └─9825 /alertmanager/alertmanager --config.file=/alertmanager/alertmanager.yml Aug 02 15:14:58 node1 systemd[1]: Started alertmanager. Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.005Z caller=main.go:221 msg="Starting Alertmanager" version="(version=0.22.2, branch=HEAD, revision=44f8adc06af5...8273f2922051)" Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.005Z caller=main.go:222 build_context="(go=go1.16.4, user=root@b595c7f32520, date=20210602-07:50:37)" Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.006Z caller=cluster.go:184 component=cluster msg="setting advertise address explicitly" addr=192.168.1.178 port=9094 Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.009Z caller=cluster.go:671 component=cluster msg="Waiting for gossip to settle..." interval=2s Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.110Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/alertmanager/alertmanager.yml Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.111Z caller=coordinator.go:126 component=configuration msg="Completed loading of configuration file" file=/alert...ertmanager.yml Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.122Z caller=main.go:514 msg=Listening address=:9093 Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.122Z caller=tls_config.go:191 msg="TLS is disabled." http2=false Aug 02 15:15:01 node1 alertmanager[9825]: level=info ts=2021-08-02T07:15:01.009Z caller=cluster.go:696 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000791462s Hint: Some lines were ellipsized, use -l to show in full.
访问192.168.1.178:9093可以看到告警web界面。
修改prometheus的配置,让prometheus监控alertmanager。
vi /prometheus/prometheus.yml 尾部添加 - job_name: 'alertmanager' static_configs: - targets: ['192.168.1.178:9093']
修改prometheus的配置,让prometheus连接alertmanager。
vi /prometheus/prometheus.yml 修改 # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - 192.168.1.178:9093
开启告警配置,这个是prometheus里边配置的。
vi /prometheus/prometheus.yml 修改 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - "redis.yml"
redis.yml报警规则配置,一些阈值自己定义:
vi /prometheus/redis.yml groups: - name: Redis rules: - alert: RedisDown expr: redis_up == 0 for: 5m labels: severity: error annotations: summary: "Redis down (instance {{ $labels.instance }})" description: "Redis 挂了啊,mmp\n VALUE = {{ $value }}\n LABELS: {{ $labels }}" - alert: MissingBackup expr: time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24 for: 5m labels: severity: error annotations: summary: "Missing backup (instance {{ $labels.instance }})" description: "Redis has not been backuped for 24 hours\n VALUE = {{ $value }}\n LABELS: {{ $labels }}" - alert: OutOfMemory expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90 for: 5m labels: severity: warning annotations: summary: "Out of memory (instance {{ $labels.instance }})" description: "Redis is running out of memory (> 90%)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}" - alert: ReplicationBroken expr: delta(redis_connected_slaves[1m]) < 0 for: 5m labels: severity: error annotations: summary: "Replication broken (instance {{ $labels.instance }})" description: "Redis instance lost a slave\n VALUE = {{ $value }}\n LABELS: {{ $labels }}" - alert: TooManyConnections expr: redis_connected_clients > 10 for: 1m labels: severity: warning annotations: summary: "Too many connections (instance {{ $labels.instance }})" description: "Redis instance has too many connections\n VALUE = {{ $value }}\n LABELS: {{ $labels }}" - alert: NotEnoughConnections expr: redis_connected_clients < 5 for: 5m labels: severity: warning annotations: summary: "Not enough connections (instance {{ $labels.instance }})" description: "Redis instance should have more connections (> 5)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}" - alert: RejectedConnections expr: increase(redis_rejected_connections_total[1m]) > 0 for: 5m labels: severity: error annotations: summary: "Rejected connections (instance {{ $labels.instance }})" description: "Some connections to Redis has been rejected\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
报警如下:
grafana部署
下载地址:https://grafana.com/grafana/download?edition=oss
官方安装指南:
https://grafana.com/docs/grafana/latest/installation/rpm/#2-start-the-server
由于是rpm包,安装起来非常方便。
依赖包缺啥安装啥。
yum install -y fontconfig yum install -y urw-fonts rpm -ivh grafana-8.0.6-1.x86_64.rpm
设置开机自启动并开启grafana
/bin/systemctl daemon-reload /bin/systemctl enable grafana-server.service /bin/systemctl start grafana-server.service [root@node1 soft]# which grafana-server /usr/sbin/grafana-server [root@node1 soft]# which grafana-cli /usr/sbin/grafana-cli
查看状态
[root@node1 soft]# systemctl status grafana-server ● grafana-server.service - Grafana instance Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2021-07-29 15:58:38 CST; 4min 37s ago Docs: http://docs.grafana.org Main PID: 6884 (grafana-server) CGroup: /system.slice/grafana-server.service └─6884 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile=/var/run/grafana/grafana-server.pid --packaging=rpm cfg:default.paths.logs=/var/log/grafana cfg:default.paths.data=/var/lib/grafana cfg:default... Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="migrations completed" logger=migrator performed=330 skipped=0 duration=1.710091718s Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Created default admin" logger=sqlstore user=admin Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Created default organization" logger=sqlstore Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Starting plugin search" logger=plugins Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Registering plugin" logger=plugins id=grafana-plugin-admin-app Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Registering plugin" logger=plugins id=input Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="External plugins directory created" logger=plugins directory=/var/lib/grafana/plugins Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Live Push Gateway initialization" logger=live.push_http Jul 29 15:58:38 node1 systemd[1]: Started Grafana instance. Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="HTTP Server Listen" logger=http.server address=[::]:3000 protocol=http subUrl= socket=
访问192.168.1.178:3000就可以访问web版的。
配置数据源。
下载仪表盘:
https://grafana.com/grafana/dashboards/763 --用这个
https://grafana.com/grafana/dashboards/12980
https://grafana.com/grafana/dashboards/12776
导入仪表盘:
要导入仪表板,请单击侧面菜单中的 + 图标,然后单击导入,选择数据源后确定。
最终:
注意:Memory Usage这个图表,一直是∞%。是因为redis_memory_max_bytes 获取的值为0,导致 redis_memory_used_bytes / redis_memory_max_bytes 结果不正常。
解决办法:将redis_memory_max_bytes 改为服务器的真实内存大小。
更改计算公式,其中8370298880为free -b显示的实际的物理内存大小:
redis_memory_used_bytes{instance=~"$instance"} / 8370298880
参考链接:
Prometheus 监控Redis的正确姿势(redis集群)
Prometheus监控平台Alertmanager配置告警
yam文本格式检测工具:http://www.bejson.com/validators/yaml_editor/