go - Monitoring

保证高可用的方法
1. 日志
2. 链路追踪
3. 监控
  1. 业务监控(领导层)
OPS/DAU/访问状态 http code/业务接口(登陆注册聊天上传留言搜索)
  2. system monitoring
  (运维)
operating system: cpu/memory/disk usage/disk space/TCP(上w的连接),流量
组件:mysql,redis,kafka
  3. logging monitoring
  (运维)
    3.1 业务日志(大数据日志,普通日志)
    3.2 系统日志(操作系统日志,mysql日志,kafka)
    日志管理系统,ELK日志系统,loki
  4. 网络监控
  5. 程序监控
开发提供监控接口
比如监控一天产生500 ErrUserNotFound的错误有多少

Prometheus: monitoring and alerting toolkit

 

 promQL:

XXX{标签查询}[时间区间查询]

prometheus_http_request_total{code:"302"}[5h] // 过去5h内查询code为302的total

prometheus_http_request_total{code:"302"}[5h]  offset 1h // 5h之前的1h

sum(prometheus_http_request_total{})  // sum of total 

Grafna:

配置datasourse:Promtheus URL: 本地局域网内

(因为prom要pull metrics through exporter, 如果部署在cloud,就必须把自己的service发布到cloud server,不然从cloud访问不到自己本地的service

dashboard - edit panel - 选择metrics,shift+enter - save dashboard

  • visualize k8s deployment
  • visualize Jira data
  • visualize MongoDB

prometheus的数据格式:metrics
metrics是一种对采样数据的总称

guages
最简单的度量指标,只是一个简单的返回值;
瞬时状态
counter
计数器

guages+counter 70%

histograms:解决计算平均数的uneven distribution
(like,高峰时期的访问数和低谷的访问数)

Integrated with Gin demo:

package main

import (
    "github.com/gin-gonic/gin"
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
    "github.com/prometheus/client_golang/prometheus/promhttp"
    "time"
)

func recordMetrics() {
    for {
        ops.Inc()
        time.Sleep(2 * time.Second)
    }
}

var (
    ops = promauto.NewCounter(prometheus.CounterOpts{ // register到prom
        Name: "mxshop-test",
        Help: "just for test",
    })
)

func main() {
    go recordMetrics()
    r := gin.Default()
    r.GET("/metrics", gin.WrapH(promhttp.Handler())) // IP:8050/metrics中serve promhttp
    r.Run(":8050")
}

修改Prometheus的配置,在job中增加 IP:8050

进入9090的graph GUI: 可以看到名为mxshop_test的counter

 

posted @ 2024-06-22 22:19  PEAR2020  阅读(6)  评论(0编辑  收藏  举报