参考:
http://ginobefunny.com/post/learning_distributed_systems_tracing/
http://www.cnblogs.com/zhengyun_ustc/p/55solution2.html
Dapper,大规模分布式系统的跟踪系统: http://bigbully.github.io/Dapper-translation/
http://blog.csdn.net/liaokailin/article/details/52077620
Google叫Dapper,淘宝叫鹰眼,Twitter叫ZipKin,京东商城叫Hydra,eBay叫Centralized Activity Logging (CAL),大众点评网叫CAT
请求到达服务器,应用容器在执行业务处理之前,先执行埋点逻辑,分配一个全局唯一调用链ID(TraceId),埋点逻辑将TraceId放在一个调用上下文对象里,该对象存储在ThreadLocal中。还有一个RpcId用于区分同一个调用链多个网络调用的发生顺序和嵌套层次关系。发起RPC调用后,首先从当前线程ThreadLocal获取上下文,底层RpcId序列号,可以采用多级序列号形式。返回响应对象之前,会把这次调用情况以及 TraceId、RpcId 都打印到它的访问日志之中,同时,会从ThreadLocal 清理掉调用上下文
为每次调用分配 TraceId、RpcId,放在 ThreadLocal 的调用上下文上面,调用结束的时候,把 TraceId、RpcId 打印到访问日志。
zipkin作用:
服务调用追踪,统计,问题排查
zipkin工作原理:
创造一些追踪标识符(tracingId,spanId,parentId),最终将一个request的流程树构建出来,各业务系统在彼此调用时,将特定的跟踪消息传递至zipkin,zipkin在收集到跟踪信息后将其聚合处理、存储、展示等,用户可通过web UI方便获得网络延迟、调用链路、系统依赖等等。
transport作用:收集被trace的services的spans,并将它们转化为zipkin common Span,之后把这些Spans传递的存储层
collector会对一个到来的被trace的数据(span)进行验证、存储并设置索引(Cassandra/ES-search/Memory)
zipkin核心数据结构:
Annotation(用途:用于定位一个request的开始和结束,cs/sr/ss/cr含有额外的信息,比如说时间点):
cs:Client Start,表示客户端发起请求一个span的开始
sr:Server Receive,表示服务端收到请求
ss:Server Send,表示服务端完成处理,并将结果发送给客户端
cr:Client Received,表示客户端获取到服务端返回信息一个span的结束,当这个annotation被记录了,这个RPC也被认为完成了
客户端调用时间=cr-cs
服务端处理时间=sr-ss
Span:一个请求(包含一组Annotation和BinaryAnnotation);它是基本工作单元,一次链路调用(可以是RPC,DB等没有特定的限制)创建一个span,通 过一个64位ID标识它。
span通过还有其他的数据,例如描述信息,时间戳,key-value对的(Annotation)tag信息,parent-id等,其中parent-id可以表示span调用链路来 源,通俗的理解span就是一次请求信息
Trace:类似于树结构的Span集合,表示一条调用链路,存在唯一标识
通过traceId(全局的跟踪ID,是跟踪的入口点,根据需求来决定在哪生成traceId)、spanId(请求跟踪ID,比如一次rpc等)和parentId(上一次 请求跟踪ID,用来将前后的请求串联起来),被收集到的span会汇聚成一个tree,从而提供出一个request的整体流程。
Zipkin-springboot试验:
安装,默认端口9411:
wget -O zipkin.jar 'https://search.maven.org/remote_content?g=io.zipkin.java&a=zipkin-server&v=LATEST&c=exec'
nohup java -jar zipkin.jar &
创建spring-boot工程:
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-aop</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId>io.zipkin.brave</groupId> <artifactId>brave-core</artifactId> <version>3.9.0</version> </dependency> <!-- https://mvnrepository.com/artifact/io.zipkin.brave/brave-http --> <dependency> <groupId>io.zipkin.brave</groupId> <artifactId>brave-http</artifactId> <version>3.9.0</version> </dependency> <dependency> <groupId>io.zipkin.brave</groupId> <artifactId>brave-spancollector-http</artifactId> <version>3.9.0</version> </dependency> <dependency> <groupId>io.zipkin.brave</groupId> <artifactId>brave-web-servlet-filter</artifactId> <version>3.9.0</version> </dependency> <dependency> <groupId>io.zipkin.brave</groupId> <artifactId>brave-okhttp</artifactId> <version>3.9.0</version> </dependency>
创建spring-boot启动类:
@SpringBootApplication public class Application { public static void main(String[] args) { SpringApplication app = new SpringApplication(Application.class); app.run(args); } }
创建对应的controller,通过服务调用9090/foo:
@RestController public class HomeController { @Autowired private OkHttpClient client; private Random random = new Random(); @RequestMapping(value = "/start") public String start() throws InterruptedException, IOException { int sleep= random.nextInt(100); TimeUnit.MILLISECONDS.sleep(sleep); Request request = new Request.Builder().url("http://localhost:9090/foo").get().build(); Response response = client.newCall(request).execute(); return " [service1 sleep " + sleep+" ms]" + response.body().toString(); } }
创建application.properties指定服务启动的Port,对zipkin提供的名称以及zipkin服务的地址和其他设置信息:
com.zipkin.serviceName=service-start com.zipkin.url=http://******:9411 com.zipkin.connectTimeout=6000 com.zipkin.readTimeout=6000 com.zipkin.flushInterval=1 com.zipkin.compressionEnabled=true server.port=8080
创建配置实体:
@Configuration @ConfigurationProperties(prefix = "com.zipkin") public class ZipkinProperties { private String serviceName; private String url; private int connectTimeout; private int readTimeout; private int flushInterval; private boolean compressionEnabled; public String getUrl() { return url; } public void setUrl(String url) { this.url = url; } public int getConnectTimeout() { return connectTimeout; } public void setConnectTimeout(int connectTimeout) { this.connectTimeout = connectTimeout; } public int getReadTimeout() { return readTimeout; } public void setReadTimeout(int readTimeout) { this.readTimeout = readTimeout; } public int getFlushInterval() { return flushInterval; } public void setFlushInterval(int flushInterval) { this.flushInterval = flushInterval; } public boolean isCompressionEnabled() { return compressionEnabled; } public void setCompressionEnabled(boolean compressionEnabled) { this.compressionEnabled = compressionEnabled; } public String getServiceName() { return serviceName; } public void setServiceName(String serviceName) { this.serviceName = serviceName; } }
创建brave处理类:
@Configuration public class ZipkinConfig { @Autowired private ZipkinProperties properties; @Bean public SpanCollector spanCollector() { HttpSpanCollector.Config config = HttpSpanCollector.Config.builder().connectTimeout(properties.getConnectTimeout()).readTimeout(properties.getReadTimeout()) .compressionEnabled(properties.isCompressionEnabled()).flushInterval(properties.getFlushInterval()).build(); return HttpSpanCollector.create(properties.getUrl(), config, new EmptySpanCollectorMetricsHandler()); } @Bean public Brave brave(SpanCollector spanCollector){ Brave.Builder builder = new Brave.Builder(properties.getServiceName()); //指定state builder.spanCollector(spanCollector); builder.traceSampler(Sampler.ALWAYS_SAMPLE); Brave brave = builder.build(); return brave; } @Bean public BraveServletFilter braveServletFilter(Brave brave){ BraveServletFilter filter = new BraveServletFilter(brave.serverRequestInterceptor(),brave.serverResponseInterceptor(),new DefaultSpanNameProvider()); return filter; } @Bean public OkHttpClient okHttpClient(Brave brave){ OkHttpClient client = new OkHttpClient.Builder() .addInterceptor(new BraveOkHttpRequestResponseInterceptor(brave.clientRequestInterceptor(), brave.clientResponseInterceptor(), new DefaultSpanNameProvider())) .build(); return client; } }
启动服务,即可启动服务并将本服务注册到zipkin
其他服务9090:
重新创建spring-boot工程,修改application.properties指定服务启动的Port,对zipkin提供的名称以及zipkin服务的地址和其他设置信息:
com.zipkin.serviceName=service-foo com.zipkin.url=http://******:9411 com.zipkin.connectTimeout=6000 com.zipkin.readTimeout=6000 com.zipkin.flushInterval=1 com.zipkin.compressionEnabled=true server.port=9090
修改Controller,增加对/foo的处理:
@RestController public class HomeController { @Autowired private OkHttpClient client;
private Random random = new Random(); @RequestMapping(value = "/foo") public String foo() throws InterruptedException, IOException { Random random = new Random(); int sleep= random.nextInt(100); TimeUnit.MILLISECONDS.sleep(sleep); Request request = new Request.Builder().url("http://localhost:9091/bar").get().build(); //service3 Response response = client.newCall(request).execute(); String result = response.body().string(); request = new Request.Builder().url("http://localhost:9092/tar").get().build(); //service4 response = client.newCall(request).execute(); result += response.body().string(); return " [service2 sleep " + sleep+" ms]" + result; } }
其他服务9091/9092 application.properties修改端口即可,Controller添加对应的处理方法:
@RequestMapping(value = "/bar") public String bar() throws InterruptedException, IOException { Random random = new Random(); int sleep= random.nextInt(100); TimeUnit.MILLISECONDS.sleep(sleep); return " [service3 sleep " + sleep+" ms]"; }
@RequestMapping(value = "/tar") public String tar() throws InterruptedException, IOException { Random random = new Random(); int sleep= random.nextInt(1000); TimeUnit.MILLISECONDS.sleep(sleep); return " [service4 sleep " + sleep+" ms]"; }
分别启动各个spring-boot工程,访问http://localhost:8080/start启动调用关系,查看zipkin:
点击进入查看具体的调用关系:
再次进入详情,看到具体的到达和处理时间:
至此可以清楚看到每次调用链的关系。
Zipkin-dubbo
在dubbo中引入zipkin是非常方便的,因为无非就是写filter,在请求处理前后发送日志数据,让zipkin生成调用链数据