go - Monitoring
保证高可用的方法
1. 日志
2. 链路追踪
3. 监控
1. 业务监控(领导层)
OPS/DAU/访问状态 http code/业务接口(登陆注册聊天上传留言搜索)
2. system monitoring
(运维)
operating system: cpu/memory/disk usage/disk space/TCP(上w的连接),流量
组件:mysql,redis,kafka
3. logging monitoring
(运维)
3.1 业务日志(大数据日志,普通日志)
3.2 系统日志(操作系统日志,mysql日志,kafka)
日志管理系统,ELK日志系统,loki
4. 网络监控
5. 程序监控
开发提供监控接口
比如监控一天产生500 ErrUserNotFound的错误有多少
Prometheus: monitoring and alerting toolkit
promQL:
XXX{标签查询}[时间区间查询]
prometheus_http_request_total{code:"302"}[5h] // 过去5h内查询code为302的total
prometheus_http_request_total{code:"302"}[5h] offset 1h // 5h之前的1h
sum(prometheus_http_request_total{}) // sum of total
Grafna:
配置datasourse:Promtheus URL: 本地局域网内
(因为prom要pull metrics through exporter, 如果部署在cloud,就必须把自己的service发布到cloud server,不然从cloud访问不到自己本地的service)
dashboard - edit panel - 选择metrics,shift+enter - save dashboard
- visualize k8s deployment
- visualize Jira data
- visualize MongoDB
prometheus的数据格式:metrics
metrics是一种对采样数据的总称
guages
最简单的度量指标,只是一个简单的返回值;
瞬时状态
counter
计数器
guages+counter 70%
histograms:解决计算平均数的uneven distribution
(like,高峰时期的访问数和低谷的访问数)
Integrated with Gin demo:
package main import ( "github.com/gin-gonic/gin" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promauto" "github.com/prometheus/client_golang/prometheus/promhttp" "time" ) func recordMetrics() { for { ops.Inc() time.Sleep(2 * time.Second) } } var ( ops = promauto.NewCounter(prometheus.CounterOpts{ // register到prom Name: "mxshop-test", Help: "just for test", }) ) func main() { go recordMetrics() r := gin.Default() r.GET("/metrics", gin.WrapH(promhttp.Handler())) // IP:8050/metrics中serve promhttp r.Run(":8050") }
修改Prometheus的配置,在job中增加 IP:8050
进入9090的graph GUI: 可以看到名为mxshop_test的counter