distributed tracing/circuit breaker/rate limiting/gateway
distributed tracing
1. for debugging and 分析接口性能 when dependency relationships are complex
2. 技术对比
3. jaeger installation
docker run \ --rm \ --name jaeger \ -p6831:6831/udp \ -p16686:16686 \ -p14268:14268 \ jaegertracing/all-in-one:latest
http://192.168.2.112:16686/
4. Jaeger组成
Jaeger Client - 为不同语言实现了符合 OpenTracing 标准的 SDK。应用程序通过 API 写入数据,client library 把 trace 信息按照应用程序指定的采样策略传递给 jaeger-agent。
Agent - 它是一个监听在 UDP 端口上接收 span 数据的网络守护进程,它会将数据批量发送给 collector。它被设计成一个基础组件,部署到所有的宿主机上。Agent 将 client library 和 collector 解耦,为 client library 屏蔽了路由和发现 collector 的细节。
Collector - 接收 jaeger-agent 发送来的数据,然后将数据写入后端存储。Collector 被设计成无状态的组件,因此您可以同时运行任意数量的 jaeger-collector。
Data Store - 后端存储被设计成一个可插拔的组件,支持将数据写入 cassandra、elastic search。
Query - 接收查询请求,然后从后端存储系统中检索 trace 并通过 UI 进行展示。Query 是无状态的,您可以启动多个实例,把它们部署在 nginx 这样的负载均衡器后面。
func UserRouter(group *gin.RouterGroup) { zap.S().Infof("init the UserRouter...") rg := group.Group("/user").Use(middlewares.Tracing()) { rg.GET("/list", middlewares.JWTAuth(), middlewares.IsAdmin(), api.GetUserList) rg.POST("/login", api.LoginValidate) rg.POST("/register", api.Register) } }
分布式追踪系统发展很快,种类繁多,但核心步骤一般有三个:代码埋点code profiling,数据存储、查询展示
5. usage
package main import ( "github.com/opentracing/opentracing-go" "github.com/uber/jaeger-client-go" jaegercfg "github.com/uber/jaeger-client-go/config" jaegerlog "github.com/uber/jaeger-client-go/log" "time" ) func ExampleConfiguration_InitGlobalTracer_testing() { // Sample configuration for testing. Use constant sampling to sample every trace // and enable LogSpan to log every span via configured Logger. cfg := jaegercfg.Configuration{ Sampler: &jaegercfg.SamplerConfig{ Type: jaeger.SamplerTypeConst, Param: 1, }, Reporter: &jaegercfg.ReporterConfig{ LogSpans: true, LocalAgentHostPort: "192.168.2.112:6831", }, ServiceName: "mxshop", // tracer Name } jLogger := jaegerlog.StdLogger tracer, closer, err := cfg.NewTracer(jaegercfg.Logger(jLogger)) if err != nil { panic(err) return } opentracing.SetGlobalTracer(tracer) defer closer.Close() // a span: invoking an interface parentSpan := opentracing.StartSpan("main") span := opentracing.StartSpan("funcA", opentracing.ChildOf(parentSpan.Context())) time.Sleep(1000 * time.Millisecond) span.Finish() span2 := opentracing.StartSpan("funcB", opentracing.ChildOf(parentSpan.Context())) time.Sleep(500 * time.Millisecond) span2.Finish() parentSpan.Finish() } func main() { ExampleConfiguration_InitGlobalTracer_testing() }
grpc + jaeger + opentracing
使用grpc-opentracing里面的interceptor功能
client:
package main import ( "context" "fmt" "github.com/opentracing/opentracing-go" "github.com/uber/jaeger-client-go" jaegercfg "github.com/uber/jaeger-client-go/config" jaegerlog "github.com/uber/jaeger-client-go/log" "time" "google.golang.org/grpc" otgrpc "GoProjects/jaeger/otgrpc" proto2 "GoProjects/jaeger/proto" ) func main() { interceptor := func(ctx context.Context, method string, req, reply any, cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error { fmt.Println("===client has an interceptor...") now := time.Now() err := invoker(ctx, method, req, reply, cc) fmt.Println("===cost ", time.Since(now)) return err } cfg := jaegercfg.Configuration{ Sampler: &jaegercfg.SamplerConfig{ Type: jaeger.SamplerTypeConst, Param: 1, }, Reporter: &jaegercfg.ReporterConfig{ LogSpans: true, LocalAgentHostPort: "192.168.2.112:6831", }, ServiceName: "test", // tracer Name } jLogger := jaegerlog.StdLogger tracer, closer, err := cfg.NewTracer(jaegercfg.Logger(jLogger)) if err != nil { panic(err) return } opentracing.SetGlobalTracer(tracer) // if not set, opentracing.GlobalTracer() will return NoopTracer() which is Empty defer closer.Close() // DailOption is a slice var opts []grpc.DialOption opts = append(opts, grpc.WithInsecure()) opts = append(opts, grpc.WithUnaryInterceptor(interceptor)) // add retry_interceptor with options opts = append(opts, grpc.WithUnaryInterceptor( otgrpc.OpenTracingClientInterceptor(opentracing.GlobalTracer()))) conn, err := grpc.Dial(":1234", opts...) if err != nil { panic("conn error") } defer conn.Close() var c = proto2.NewGreeterClient(conn) //span := opentracing.StartSpan("testSayHello") reply, err := c.SayHello(context.Background(), &proto2.HelloRequest{ Name: "boby", }) //span.Finish() if err != nil { panic(err) } fmt.Println(reply) }
从api到grpc的传递
6. gin-http-jaeger
在api层面用interceptor初始化一个tracer。一个api对应一个tracer
middleware里增加:
import ( "github.com/gin-gonic/gin" "github.com/opentracing/opentracing-go" jeagerClient "github.com/uber/jaeger-client-go" jaegercfg "github.com/uber/jaeger-client-go/config" jaegerlog "github.com/uber/jaeger-client-go/log" "go.uber.org/zap" ) func Tracing() gin.HandlerFunc { return func(c *gin.Context) { zap.S().Infof("jaeper tracing....1") cfg := jaegercfg.Configuration{ Sampler: &jaegercfg.SamplerConfig{ Type: jeagerClient.SamplerTypeConst, Param: 1, }, Reporter: &jaegercfg.ReporterConfig{ LogSpans: true, LocalAgentHostPort: "192.168.2.112:6831", }, ServiceName: "mxshop-api", // tracer Name } jLogger := jaegerlog.StdLogger tracer, closer, err := cfg.NewTracer(jaegercfg.Logger(jLogger)) if err != nil { panic(err) return } opentracing.SetGlobalTracer(tracer) defer closer.Close() // a span: invoking an interface span := opentracing.StartSpan(c.Request.URL.Path) defer span.Finish() c.Set("tracer", tracer) c.Set("parentSpan", span) c.Next() } }
router里给每个api最开始增加interceptor
func UserRouter(group *gin.RouterGroup) { zap.S().Infof("init the UserRouter...") rg := group.Group("/user").Use(middlewares.Tracing()) { rg.GET("/list", middlewares.JWTAuth(), middlewares.IsAdmin(), api.GetUserList) rg.POST("/login", api.LoginValidate) rg.POST("/register", api.Register) } }
7. grpc-jaeger
在项目中,main.go在初始化的时候会初始化router,但并不会进入router-intercepter执行,所以tracer并不会初始化的时候建立
但是main.go会初始化client,在dialoption里有opentracing-interceptor并用NoopTracer初始化,如下:
var testtracer = opentracing.GlobalTracer() global.Conn, err = grpc.Dial(fmt.Sprintf("consul://%s:8500/%s?wait=14s&tag=%s", global.SrvConfig.ConsulConfig.Host, global.SrvConfig.UserServiceConfig.Name, "mxshop"), grpc.WithInsecure(), grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy": "round_robin"}`), grpc.WithUnaryInterceptor(otgrpc.OpenTracingClientInterceptor(testtracer)), )
What we expect:
1. tracer in client and tracer in api are the same (pass the tracer from api to client, but can only set tracer into *gin.Context in api, and can only get from context.context in client) -> using context.WithValue() to encapsulate the context with KV: newctx := context.WithValue(context.Background(), "ginContext", c)
2. api span is parent, client span is child (set parentSpan in api interceptor, pass to client interceptor and set the parent)
set into *gin.Context:
c.Set("tracer", tracer) c.Set("parentSpan", span)
get tracer and parentSpan in client-interceptor from context.context
ginContext := ctx.Value("ginContext") switch ginContext.(type) { case *gin.Context: if itracer, ok := ginContext.(*gin.Context).Get("tracer"); ok { tracer = itracer.(opentracing.Tracer) } if parentSpan, ok := ginContext.(*gin.Context).Get("parentSpan"); ok { parentCtx = parentSpan.(*jaegerClient.Span).Context() } }
grpc之间的传递
client1添加client端拦截器: otgrpc.OpenTracingClientInterceptor(opentracing.GlobalTracer()
server1 main.gp添加server端拦截器:
cfg := jaegercfg.Configuration{ Sampler: &jaegercfg.SamplerConfig{ Type: jeagerClient.SamplerTypeConst, Param: 1, }, Reporter: &jaegercfg.ReporterConfig{ LogSpans: true, LocalAgentHostPort: "192.168.2.112:6831", }, ServiceName: "test", // tracer Name } jLogger := jaegerlog.StdLogger tracer, closer, err := cfg.NewTracer(jaegercfg.Logger(jLogger)) if err != nil { panic(err) return } opentracing.SetGlobalTracer(tracer) so := grpc.UnaryInterceptor(otgrpc.OpenTracingServerInterceptor(tracer)) g := grpc.NewServer(so) proto2.RegisterGreeterServer(g, &Server1{}) // grpc-server, implemented server lis, err := net.Listen("tcp", ":1234") if err != nil { panic("listen error") } err = g.Serve(lis) if err != nil { panic("fail to start grpc") } quit := make(chan os.Signal) signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM) <-quit _ = closer.Close()
in function:
parentSpan := opentracing.SpanFromContext(ctx) span := opentracing.StartSpan("sayHello", opentracing.ChildOf(parentSpan.Context())) reply, _ := Server2Client.SayYes(ctx, &proto2.HelloRequest2{ Name: request.Name, }) span.Finish()
rate limiting
1. using sentinel (alibaba): https://sentinelguard.io/zh-cn/docs/golang/flow-control.html
TokenCalculateStrategy
: 当前流量控制器的Token计算策略。Direct表示直接使用字段 Threshold 作为阈值;WarmUp表示使用预热方式计算Token的阈值。
Threshold
: 表示流控阈值;如果字段 StatIntervalInMs 是1000(也就是1秒),那么Threshold就表示QPS,流量控制器也就会依据资源的QPS来做流控。WarmUpPeriodSec
: 预热的时间长度,该字段仅仅对WarmUp
的TokenCalculateStrategy生效;StatIntervalInMs
: 规则对应的流量控制器的独立统计结构的统计周期。如果StatIntervalInMs是1000,也就是统计QPS。
1. QPS:
1.1 reject
// Copyright 1999-2020 Alibaba Group Holding Ltd. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. package main import ( "fmt" "log" "time" sentinel "github.com/alibaba/sentinel-golang/api" "github.com/alibaba/sentinel-golang/core/base" "github.com/alibaba/sentinel-golang/core/flow" ) func doTest() { // We should initialize Sentinel first. err := sentinel.InitDefault() if err != nil { log.Fatalf("Unexpected error: %+v", err) } _, err = flow.LoadRules([]*flow.Rule{ { Resource: "some-test", Threshold: 10, TokenCalculateStrategy: flow.Direct, ControlBehavior: flow.Reject, StatIntervalInMs: 1000, }, }) if err != nil { log.Fatalf("Unexpected error: %+v", err) return } for i := 0; i < 12; i++ { e, b := sentinel.Entry("some-test", sentinel.WithTrafficType(base.Inbound)) if b != nil { // Blocked. We could get the block reason from the BlockError. fmt.Println("refuse") } else { // Passed, wrap the logic here. fmt.Println("ok") // Be sure the entry is exited finally. e.Exit() } } time.Sleep(time.Second * 5) } func main() { doTest() }
1.2. throttling : ControlBehavior: flow.Throttling,
1000ms 2个 -> per 500ms 通过1个
2. warmup: 30s内每秒越来越多直到QPS1000
package main import ( "fmt" sentinel "github.com/alibaba/sentinel-golang/api" "github.com/alibaba/sentinel-golang/core/base" "github.com/alibaba/sentinel-golang/core/flow" "log" "math/rand" "sync/atomic" "time" ) type Counter struct { pass *int64 block *int64 total *int64 } func doTest() { // We should initialize Sentinel first. err := sentinel.InitDefault() if err != nil { log.Fatalf("Unexpected error: %+v", err) } _, err = flow.LoadRules([]*flow.Rule{ { Resource: "some-test", Threshold: 1000, TokenCalculateStrategy: flow.WarmUp, ControlBehavior: flow.Reject, StatIntervalInMs: 1000, WarmUpPeriodSec: 30, }, }) if err != nil { log.Fatalf("Unexpected error: %+v", err) return } var ch = make(chan int) counter := Counter{ pass: new(int64), block: new(int64), total: new(int64), } for i := 0; i < 100; i++ { go func() { for { atomic.AddInt64(counter.total, 1) e, b := sentinel.Entry("some-test", sentinel.WithTrafficType(base.Inbound)) if b != nil { atomic.AddInt64(counter.block, 1) } else { e.Exit() atomic.AddInt64(counter.pass, 1) } time.Sleep(time.Duration(rand.Uint64()%10) * time.Millisecond) } }() } go func() { var ( oldTotal int64 oldPass int64 oldBlock int64 ) for { time.Sleep(time.Second) globalTotal := atomic.LoadInt64(counter.total) oneSecondTotal := globalTotal - oldTotal oldTotal = globalTotal globalPass := atomic.LoadInt64(counter.pass) oneSecondPass := globalPass - oldPass oldPass = globalPass globalBlock := atomic.LoadInt64(counter.block) oneSecondBlock := globalBlock - oldBlock oldBlock = globalBlock fmt.Printf("total:%d, pass:%d, block:%d\n", oneSecondTotal, oneSecondPass, oneSecondBlock) } }() <-ch } func main() { doTest() }
circuit breaking
Sentinel 熔断器的三种熔断策略都支持静默期 (规则中通过MinRequestAmount字段表示)。静默期是指一个最小的静默请求数,在一个统计周期内,如果对资源的请求数小于设置的静默数,那么熔断器将不会基于其统计值去更改熔断器的状态。静默期的设计理由也很简单,举个例子,假设在一个统计周期刚刚开始时候,第 1 个请求碰巧是个慢请求,这个时候这个时候的慢调用比例就会是 100%,很明显是不合理,所以存在一定的巧合性。所以静默期提高了熔断器的精准性以及降低误判可能性。
Sentinel 支持以下几种熔断策略:
- 慢调用比例策略 (SlowRequestRatio):Sentinel 的熔断器不在静默期,并且慢调用的比例大于设置的阈值,则接下来的熔断周期内对资源的访问会自动地被熔断。该策略下需要设置允许的调用 RT 临界值(即最大的响应时间),对该资源访问的响应时间大于该阈值则统计为慢调用。
- 错误比例策略 (ErrorRatio):Sentinel 的熔断器不在静默期,并且在统计周期内资源请求访问异常的比例大于设定的阈值,则接下来的熔断周期内对资源的访问会自动地被熔断。
- 错误计数策略 (ErrorCount):Sentinel 的熔断器不在静默期,并且在统计周期内资源请求访问异常数大于设定的阈值,则接下来的熔断周期内对资源的访问会自动地被熔断。
注意:这里的错误比例熔断和错误计数熔断指的业务返回错误的比例或则计数。也就是说,如果规则指定熔断器策略采用错误比例或则错误计数,那么为了统计错误比例或错误计数,需要调用API: api.TraceError(entry, err)
埋点每个请求的业务异常。
github example: https://github.com/alibaba/sentinel-golang/blob/master/example/circuitbreaker/error_count/circuit_breaker_error_count_example.go
package main import ( "errors" "fmt" "log" "math/rand" "sync/atomic" "time" sentinel "github.com/alibaba/sentinel-golang/api" "github.com/alibaba/sentinel-golang/core/circuitbreaker" "github.com/alibaba/sentinel-golang/core/config" "github.com/alibaba/sentinel-golang/logging" "github.com/alibaba/sentinel-golang/util" ) type stateChangeTestListener struct { } func (s *stateChangeTestListener) OnTransformToClosed(prev circuitbreaker.State, rule circuitbreaker.Rule) { fmt.Printf("rule.steategy: %+v, From %s to Closed, time: %d\n", rule.Strategy, prev.String(), util.CurrentTimeMillis()) } func (s *stateChangeTestListener) OnTransformToOpen(prev circuitbreaker.State, rule circuitbreaker.Rule, snapshot interface{}) { fmt.Printf("rule.steategy: %+v, From %s to Open, snapshot: %d, time: %d\n", rule.Strategy, prev.String(), snapshot, util.CurrentTimeMillis()) } func (s *stateChangeTestListener) OnTransformToHalfOpen(prev circuitbreaker.State, rule circuitbreaker.Rule) { fmt.Printf("rule.steategy: %+v, From %s to Half-Open, time: %d\n", rule.Strategy, prev.String(), util.CurrentTimeMillis()) } type Count struct { total *int64 pass *int64 blocked *int64 err *int64 } func main() { conf := config.NewDefaultConfig() // for testing, logging output to console conf.Sentinel.Log.Logger = logging.NewConsoleLogger() err := sentinel.InitWithConfig(conf) if err != nil { log.Fatal(err) } ch := make(chan struct{}) // Register a state change listener so that we could observer the state change of the internal circuit breaker. circuitbreaker.RegisterStateChangeListeners(&stateChangeTestListener{}) _, err = circuitbreaker.LoadRules([]*circuitbreaker.Rule{ // Statistic time span=5s, recoveryTimeout=3s, maxErrorCount=50 { Resource: "abc", Strategy: circuitbreaker.ErrorCount, RetryTimeoutMs: 3000, // 3s retry MinRequestAmount: 10, StatIntervalMs: 5000, // 5s StatSlidingWindowBucketCount: 10, Threshold: 50, // 50 error }, }) if err != nil { log.Fatal(err) } var count = Count{ total: new(int64), pass: new(int64), blocked: new(int64), err: new(int64), } logging.Info("[CircuitBreaker ErrorCount] Sentinel Go circuit breaking demo is running. You may see the pass/block metric in the metric log.") go func() { for { atomic.AddInt64(count.total, 1) e, b := sentinel.Entry("abc") if b != nil { // g1 blocked atomic.AddInt64(count.blocked, 1) fmt.Println("circuit break!!") time.Sleep(time.Duration(rand.Uint64()%20) * time.Millisecond) } else { atomic.AddInt64(count.pass, 1) if rand.Uint64()%20 < 12 { // Record current invocation as error. atomic.AddInt64(count.err, 1) sentinel.TraceError(e, errors.New("biz error")) } // g1 passed time.Sleep(time.Duration(rand.Uint64()%80+10) * time.Millisecond) e.Exit() } } }() go func() { for { atomic.AddInt64(count.total, 1) e, b := sentinel.Entry("abc") if b != nil { // g2 blocked atomic.AddInt64(count.blocked, 1) fmt.Println("circuit break!!") time.Sleep(time.Duration(rand.Uint64()%20) * time.Millisecond) } else { // g2 passed atomic.AddInt64(count.pass, 1) time.Sleep(time.Duration(rand.Uint64()%20) * time.Millisecond) e.Exit() } } }() go func() { for { time.Sleep(time.Second) fmt.Println(atomic.LoadInt64(count.err)) } }() <-ch }
gin集成
配置略过,在api调用的handler里,调用service前后加上sentinel
// for tracing newctx := context.WithValue(context.Background(), "ginContext", c) // for rate limiting e, b := sentinel.Entry("api-test", sentinel.WithTrafficType(base.Inbound)) if b != nil { fmt.Println("refuse") c.JSON(http.StatusTooManyRequests, gin.H{ "msg": "too frequent requests!!!", // map[string]string }) return } lst, err := global.UserClient.GetUserList(newctx, &proto.PageInfo{Pn: uint32(pni), PSize: uint32(pns)}) if err != nil { zap.S().Errorw("invoking [GetUserList] error") GrpcCodeToHttp(err, c) return } e.Exit()
Gateway
(一)KONG INSTALL
1. pre-installation: postgres and migration
// docker install postgres docker run -d --name kong-database \ -p 5432:5432 \ -e "POSTGRES_USER=kong" \ -e "POSTGRES_DB=kong" \ -e "POSTGRES_PASSWORD=kong" \ -e "POSTGRES_DB=kong" postgres:12 // docker initialization docker run --rm \ -e "KONG_DATABASE=postgres" \ -e "KONG_PG_HOST=192.168.0.104" \ -e "KONG_PG_PASSWORD=kong" \ -e "POSTGRES_USER=kong" \ -e "KONG_CASSANDRA_CONTACT_POINTS=kong-database" \ kong kong migrations bootstrap
2. install kong (选择在主机上安装,不在docker上安装防止不懂docker而出错)
curl -Lo kong-2.1.0.amd64.rpm $( rpm --eval "https://download.konghq.com/gateway-2.x-centos-%{centos_ver}/Packages/k/kong-2.1.0.el%{centos_ver}.amd64.rpm") sudo yum install kong-2.1.0.amd64.rpm
3. 关闭防火墙
systemctl stop firewalld.service
systemctl restart docker
4. conf
cp /etc/kong/kong.conf.default /etc/kong/kong.conf vim /etc/kong/kong.conf #修改如下内容 database = postgres pg_host = 192.168.1.102 # 这里得配置对外ip地址 不能是127.0.0.1 pg_port = 5432 # Port of the Postgres server. pg_timeout = 5000 # Defines the timeout (in ms), for connecting, # reading and writing. pg_user = kong # Postgres user. pg_password = kong # Postgres user's password. pg_database = kong # The database name to connect to. dns_resolver = 127.0.0.1:8600 #这个配置很重要,配置的是consul的dns端口,默认是8600 可以自己修改默认端口 admin_listen = 0.0.0.0:8001 reuseport backlog=16384, 127.0.0.1:8444 http2 ssl reuseport backlog=16384 proxy_listen = 0.0.0.0:8000 reuseport backlog=16384, 0.0.0.0:8443 http2 ssl reuseport backlog=16384
5. start
kong start -c /etc/kong/kong.conf #添加防火墙规则,具体作用用不知道,暂时不开启 firewall-cmd --zone=public --add-port=8001/tcp --permanent firewall-cmd --zone=public --add-port=8000/tcp --permanent sudo firewall-cmd --reload
6. url:IP:8001查看是否成功
7.GUI:install konga docker run -d -p 1337:1337 --name konga pantsel/konga
(二)KONG
(1) 也可以用service 配置到 consul(用来LB 和 health check)
配置方法:Service:
- host: [consulServiceName].service.consul
- port: 80
(2) KONG JWT
consumer 和user一一对应,可以是一个微服务,可以是整个系统
为consumer配置JWT:iss和key(Issuer)保持一致 -> 生成secret -> 使用iss和secret生成token
details:
配置过程 1. 新建一个consumer 2. 为这个consumer添加jwt 记住key!! 3. 配置全局的plugins a. 设置header为x-token 4. 在jwt.io生成一个token a. 在payload中添加 "iss": "imooc"
和自己写的custom JWT 区别:
自己写的可以解析token中的具体字段来authorization
plugin可以直接用iss+secret的token拦截验证请求
(?还是不太清楚)
(3) Anti-scraping & IP-restriction
user-agent包含浏览器source的信息: regular expression匹配,只需要写 Firefox 就行,不需要双引号