SkyWalking-JVM指标收集流程

 
JVM监控界面
 
JVM指标收集大概的示意图

@Override
public void boot() throws Throwable {
    collectMetricFuture = Executors.newSingleThreadScheduledExecutor(
        new DefaultNamedThreadFactory("JVMService-produce"))
                                   .scheduleAtFixedRate(new RunnableWithExceptionProtection(
                                       this,
                                       new RunnableWithExceptionProtection.CallbackWhenException() {
                                           @Override
                                           public void handle(Throwable t) {
                                               LOGGER.error("JVMService produces metrics failure.", t);
                                           }
                                       }
                                   ), 0, 1, TimeUnit.SECONDS);
    sendMetricFuture = Executors.newSingleThreadScheduledExecutor(
        new DefaultNamedThreadFactory("JVMService-consume"))
                                .scheduleAtFixedRate(new RunnableWithExceptionProtection(
                                    sender,
                                    new RunnableWithExceptionProtection.CallbackWhenException() {
                                        @Override
                                        public void handle(Throwable t) {
                                            LOGGER.error("JVMService consumes and upload failure.", t);
                                        }
                                    }
                                ), 0, 1, TimeUnit.SECONDS);
}
 
  1. JVMService的run方法会每秒被线程池执行一次,通过java.lang.management提供的工具采集JVM的各项指标,然后调用sender.offer将生成的JVMMetric发送到内存中的阻塞队列(LinkedBlockingQueue<JVMMetric>)

@Override
public void run() {
    long currentTimeMillis = System.currentTimeMillis();
    try {
        JVMMetric.Builder jvmBuilder = JVMMetric.newBuilder();
        jvmBuilder.setTime(currentTimeMillis);
        jvmBuilder.setCpu(CPUProvider.INSTANCE.getCpuMetric());
        jvmBuilder.addAllMemory(MemoryProvider.INSTANCE.getMemoryMetricList());
        jvmBuilder.addAllMemoryPool(MemoryPoolProvider.INSTANCE.getMemoryPoolMetricsList());
        jvmBuilder.addAllGc(GCProvider.INSTANCE.getGCList());
        jvmBuilder.setThread(ThreadProvider.INSTANCE.getThreadMetrics());
        jvmBuilder.setClazz(ClassProvider.INSTANCE.getClassMetrics());

        JVMMetric jvmMetric = jvmBuilder.build();
        sender.offer(jvmMetric);

        // refresh cpu usage percent
        cpuUsagePercent = jvmMetric.getCpu().getUsagePercent();
    } catch (Exception e) {
        LOGGER.error(e, "Collect JVM info fail.");
    }
}
 

发送数据

  1. JVMMetricsSender的run方法会每秒被线程池执行一次,当建立RPC连接后,会将数据通过gRPC发送到服务端,并接收返回值
@Override
public void run() {
    if (status == GRPCChannelStatus.CONNECTED) {
        try {
            JVMMetricCollection.Builder builder = JVMMetricCollection.newBuilder();
            LinkedList<JVMMetric> buffer = new LinkedList<>();
            queue.drainTo(buffer);
            if (buffer.size() > 0) {
                builder.addAllMetrics(buffer);
                builder.setService(Config.Agent.SERVICE_NAME);
                builder.setServiceInstance(Config.Agent.INSTANCE_NAME);
                // 数据发送到服务端
                Commands commands = stub.withDeadlineAfter(GRPC_UPSTREAM_TIMEOUT, TimeUnit.SECONDS)
                                        .collect(builder.build());
                // 处理返回值
                ServiceManager.INSTANCE.findService(CommandService.class).receiveCommand(commands);
            }
        } catch (Throwable t) {
            LOGGER.error(t, "send JVM metrics to Collector fail.");
            ServiceManager.INSTANCE.findService(GRPCChannelManager.class).reportError(t);
        }
    }
}
 

Server数据接收处理

在调试项目的过程中,发现Server端处理Java Agent上报的JVM指标数据的类是动态加载的,然后进一步发现是通过OAL和预先定义好的代码模板,借助Antlr4生成的类,这里先介绍SkyWalking Server中的动态生成类。

动态生成类

OAL(Observably Analysis Language)是借助Antlr4自定义的一套语言,通过该语言,对从服务、服务实例和EndPioint等收集到的指标进行流式处理 
 
  1. Server端的启动过程也使用了Java的SPI机制,启动时,会调用ModuleManager的init方法将继承了ModuleDefine(模块定义类)和ModuleProvider(模块提供类)的类全部加载到JVM中,并且执行这些类的prepare和start方法,这其中就包括JVMModule.java和JVMModuleProvider.java

public void init(
    ApplicationConfiguration applicationConfiguration) throws ModuleNotFoundException, ProviderNotFoundException, ServiceNotProvidedException, CycleDependencyException, ModuleConfigException, ModuleStartException {
    String[] moduleNames = applicationConfiguration.moduleList();
    ServiceLoader<ModuleDefine> moduleServiceLoader = ServiceLoader.load(ModuleDefine.class);
    ServiceLoader<ModuleProvider> moduleProviderLoader = ServiceLoader.load(ModuleProvider.class);

    HashSet<String> moduleSet = new HashSet<>(Arrays.asList(moduleNames));
    for (ModuleDefine module : moduleServiceLoader) {
        // 遍历所有模块定义类,并调用这些类的provider 执行 prepare方法
        if (moduleSet.contains(module.name())) {
            module.prepare(this, applicationConfiguration.getModuleConfiguration(module.name()), moduleProviderLoader);
            loadedModules.put(module.name(), module);
            moduleSet.remove(module.name());
        }
    }
    // Finish prepare stage
    isInPrepareStage = false;

    if (moduleSet.size() > 0) {
        throw new ModuleNotFoundException(moduleSet.toString() + " missing.");
    }

    BootstrapFlow bootstrapFlow = new BootstrapFlow(loadedModules);

    // 调用加载的类的start方法
    bootstrapFlow.start(this);
    bootstrapFlow.notifyAfterCompleted();
}
  1. JVMModuleProvider的start方法会通过CoreModuleProvider获取到的OALEngineLoaderService,并通过OALEngineLoaderService的load方法完成动态类的生成:
     
    1. JVMModuleProvider的start方法
      
      
      @Override
      public void start() throws ModuleStartException {
          // load official analysis
          getManager().find(CoreModule.NAME)
                      .provider()
                      .getService(OALEngineLoaderService.class)
                      .load(JVMOALDefine.INSTANCE);
      
          GRPCHandlerRegister grpcHandlerRegister = getManager().find(SharingServerModule.NAME)
                                                                .provider()
                                                                .getService(GRPCHandlerRegister.class);
          JVMMetricReportServiceHandler jvmMetricReportServiceHandler = new JVMMetricReportServiceHandler(getManager());
          grpcHandlerRegister.addHandler(jvmMetricReportServiceHandler);
          grpcHandlerRegister.addHandler(new JVMMetricReportServiceHandlerCompat(jvmMetricReportServiceHandler));
      }
      
      
      public class JVMOALDefine extends OALDefine {
          public static final JVMOALDefine INSTANCE = new JVMOALDefine();
      
          private JVMOALDefine() {
              super(
                  "oal/java-agent.oal",
                  "org.apache.skywalking.oap.server.core.source"
              );
          }
      }
    2. OALEngineLoaderService的load方法主要完成下面的事情:
       
      1. 通过反射和OALDefine中定义的configFile、sourcePackage等获取到一个OALRuntime实例
      2. 设置对应的StreamListener、DispatcherListener和StorageBuilderFactory
      3. 调用OALRuntime的start方法,完成动态类的生成
      4. 通知所有监听器,将动态生成的Metrics添加到MetricsStreamProcessor以生成相关的表和工作流工作任务,将SourceDispatch实现类添加至DispatcherManager
        public void load(OALDefine define) throws ModuleStartException {
            if (oalDefineSet.contains(define)) {
                // each oal define will only be activated once
                return;
            }
            try {
                // 通过反射和OALDefine中定义的configFile、sourcePackage等获取到一个OALRuntime实例
                OALEngine engine = loadOALEngine(define);
                // 设置对应的StreamListener、DispatcherListener和StorageBuilderFactory
                StreamAnnotationListener streamAnnotationListener = new StreamAnnotationListener(moduleManager);
                engine.setStreamListener(streamAnnotationListener);
                engine.setDispatcherListener(moduleManager.find(CoreModule.NAME)
                                                          .provider()
                                                          .getService(SourceReceiver.class)
                                                          .getDispatcherDetectorListener());
                engine.setStorageBuilderFactory(moduleManager.find(StorageModule.NAME)
                                                             .provider()
                                                             .getService(StorageBuilderFactory.class));
                // 调用OALRuntime的start方法,完成动态类的生成
                engine.start(OALEngineLoaderService.class.getClassLoader());
                // 通知所有监听器,将动态生成的Metrics添加到MetricsStreamProcessor,SourceDispatch至DispatcherManager
                engine.notifyAllListeners();
        
                oalDefineSet.add(define);
            } catch (ReflectiveOperationException | OALCompileException e) {
                throw new ModuleStartException(e.getMessage(), e);
            }
        }
        
        
        @Override
        public void notifyAllListeners() throws ModuleStartException {
            for (Class metricsClass : metricsClasses) {
                try {
                    streamAnnotationListener.notify(metricsClass);
                } catch (StorageException e) {
                    throw new ModuleStartException(e.getMessage(), e);
                }
            }
            for (Class dispatcherClass : dispatcherClasses) {
                try {
                    dispatcherDetectorListener.addIfAsSourceDispatcher(dispatcherClass);
                } catch (Exception e) {
                    throw new ModuleStartException(e.getMessage(), e);
                }
            }
        }
    3. OALRuntime的start方法主要完成了下面的工作:
       
      1. 通过JVMOALDefine中的位置,读取java-agent.oal文件
      2. 获取获取oal脚本解析器和OALScripts
      3. 调用generateClassAtRuntime(oalScripts)方法根据代码模板(oap-server/oal-rt/src/main/resources/code-templates)动态生成指标类和dispatcher类
        
        public void start(ClassLoader currentClassLoader) throws ModuleStartException, OALCompileException {
            if (!IS_RT_TEMP_FOLDER_INIT_COMPLETED) {
                prepareRTTempFolder();
                IS_RT_TEMP_FOLDER_INIT_COMPLETED = true;
            }
        
            this.currentClassLoader = currentClassLoader;
            Reader read;
        
            try {
                // 读取oal文件
                read = ResourceUtils.read(oalDefine.getConfigFile());
            } catch (FileNotFoundException e) {
                throw new ModuleStartException("Can't locate " + oalDefine.getConfigFile(), e);
            }
        
            OALScripts oalScripts;
            try {
                // 获取oal脚本解析器
                ScriptParser scriptParser = ScriptParser.createFromFile(read, oalDefine.getSourcePackage());
                // 获取oal脚本OALScripts
                oalScripts = scriptParser.parse();
            } catch (IOException e) {
                throw new ModuleStartException("OAL script parse analysis failure.", e);
            }
        
            // 动态生成类
            this.generateClassAtRuntime(oalScripts);
        }
        
        // 获取oal脚本解析器
        public static ScriptParser createFromFile(Reader scriptReader, String sourcePackage) throws IOException {
            ScriptParser parser = new ScriptParser();
            parser.lexer = new OALLexer(CharStreams.fromReader(scriptReader));
            parser.sourcePackage = sourcePackage;
            return parser;
        }
        
        // 获取oal脚本OALScripts
        public OALScripts parse() throws IOException {
            OALScripts scripts = new OALScripts();
        
            CommonTokenStream tokens = new CommonTokenStream(lexer);
        
            OALParser parser = new OALParser(tokens);
        
            ParseTree tree = parser.root();
            ParseTreeWalker walker = new ParseTreeWalker();
        
            walker.walk(new OALListener(scripts, sourcePackage), tree);
        
            return scripts;
        }
        
        
        private void generateClassAtRuntime(OALScripts oalScripts) throws OALCompileException {
            List<AnalysisResult> metricsStmts = oalScripts.getMetricsStmts();
            metricsStmts.forEach(this::buildDispatcherContext);
        
            for (AnalysisResult metricsStmt : metricsStmts) {
                metricsClasses.add(generateMetricsClass(metricsStmt));
                generateMetricsBuilderClass(metricsStmt);
            }
        
            for (Map.Entry<String, DispatcherContext> entry : allDispatcherContext.getAllContext().entrySet()) {
                dispatcherClasses.add(generateDispatcherClass(entry.getKey(), entry.getValue()));
            }
        
            oalScripts.getDisableCollection().getAllDisableSources().forEach(disable -> {
                DisableRegister.INSTANCE.add(disable);
            });
        }
        动态生成的类的示例(环境变量)
        ServiceInstanceJVMGCDispatcher.class
        public class ServiceInstanceJVMGCDispatcher implements SourceDispatcher<ServiceInstanceJVMGC> {
            private void doInstanceJvmYoungGcTime(ServiceInstanceJVMGC var1) {
                if ((new StringMatch()).match(var1.getPhase(), GCPhase.NEW)) {
                    InstanceJvmYoungGcTimeMetrics var2 = new InstanceJvmYoungGcTimeMetrics();
                    var2.setTimeBucket(var1.getTimeBucket());
                    var2.setEntityId(var1.getEntityId());
                    var2.setServiceId(var1.getServiceId());
                    var2.combine(var1.getTime());
                    MetricsStreamProcessor.getInstance().in(var2);
                }
            }
        
            private void doInstanceJvmOldGcTime(ServiceInstanceJVMGC var1) {
                if ((new StringMatch()).match(var1.getPhase(), GCPhase.OLD)) {
                    InstanceJvmOldGcTimeMetrics var2 = new InstanceJvmOldGcTimeMetrics();
                    var2.setTimeBucket(var1.getTimeBucket());
                    var2.setEntityId(var1.getEntityId());
                    var2.setServiceId(var1.getServiceId());
                    var2.combine(var1.getTime());
                    MetricsStreamProcessor.getInstance().in(var2);
                }
            }
        
            private void doInstanceJvmNormalGcTime(ServiceInstanceJVMGC var1) {
                if ((new StringMatch()).match(var1.getPhase(), GCPhase.NORMAL)) {
                    InstanceJvmNormalGcTimeMetrics var2 = new InstanceJvmNormalGcTimeMetrics();
                    var2.setTimeBucket(var1.getTimeBucket());
                    var2.setEntityId(var1.getEntityId());
                    var2.setServiceId(var1.getServiceId());
                    var2.combine(var1.getTime());
                    MetricsStreamProcessor.getInstance().in(var2);
                }
            }
        
            private void doInstanceJvmYoungGcCount(ServiceInstanceJVMGC var1) {
                if ((new StringMatch()).match(var1.getPhase(), GCPhase.NEW)) {
                    InstanceJvmYoungGcCountMetrics var2 = new InstanceJvmYoungGcCountMetrics();
                    var2.setTimeBucket(var1.getTimeBucket());
                    var2.setEntityId(var1.getEntityId());
                    var2.setServiceId(var1.getServiceId());
                    var2.combine(var1.getCount());
                    MetricsStreamProcessor.getInstance().in(var2);
                }
            }
        
            private void doInstanceJvmOldGcCount(ServiceInstanceJVMGC var1) {
                if ((new StringMatch()).match(var1.getPhase(), GCPhase.OLD)) {
                    InstanceJvmOldGcCountMetrics var2 = new InstanceJvmOldGcCountMetrics();
                    var2.setTimeBucket(var1.getTimeBucket());
                    var2.setEntityId(var1.getEntityId());
                    var2.setServiceId(var1.getServiceId());
                    var2.combine(var1.getCount());
                    MetricsStreamProcessor.getInstance().in(var2);
                }
            }
        
            private void doInstanceJvmNormalGcCount(ServiceInstanceJVMGC var1) {
                if ((new StringMatch()).match(var1.getPhase(), GCPhase.NORMAL)) {
                    InstanceJvmNormalGcCountMetrics var2 = new InstanceJvmNormalGcCountMetrics();
                    var2.setTimeBucket(var1.getTimeBucket());
                    var2.setEntityId(var1.getEntityId());
                    var2.setServiceId(var1.getServiceId());
                    var2.combine(var1.getCount());
                    MetricsStreamProcessor.getInstance().in(var2);
                }
            }
        
            public void dispatch(ISource var1) {
                ServiceInstanceJVMGC var2 = (ServiceInstanceJVMGC)var1;
                this.doInstanceJvmYoungGcTime(var2);
                this.doInstanceJvmOldGcTime(var2);
                this.doInstanceJvmNormalGcTime(var2);
                this.doInstanceJvmYoungGcCount(var2);
                this.doInstanceJvmOldGcCount(var2);
                this.doInstanceJvmNormalGcCount(var2);
            }
        
            public ServiceInstanceJVMGCDispatcher() {
            }
        }
        InstanceJvmOldGcCountMetrics.class
        @Stream(
            name = "instance_jvm_old_gc_count",
            scopeId = 11,
            builder = InstanceJvmOldGcCountMetricsBuilder.class,
            processor = MetricsStreamProcessor.class
        )
        public class InstanceJvmOldGcCountMetrics extends SumMetrics implements WithMetadata {
            @Column(
                columnName = "entity_id",
                length = 512
            )
            private String entityId;
            @Column(
                columnName = "service_id",
                length = 256
            )
            private String serviceId;
        
            public InstanceJvmOldGcCountMetrics() {
            }
        
            public String getEntityId() {
                return this.entityId;
            }
        
            public void setEntityId(String var1) {
                this.entityId = var1;
            }
        
            public String getServiceId() {
                return this.serviceId;
            }
        
            public void setServiceId(String var1) {
                this.serviceId = var1;
            }
        
            protected String id0() {
                StringBuilder var1 = new StringBuilder(String.valueOf(this.getTimeBucket()));
                var1.append("_").append(this.entityId);
                return var1.toString();
            }
        
            public int hashCode() {
                byte var1 = 17;
                int var2 = 31 * var1 + this.entityId.hashCode();
                var2 = 31 * var2 + (int)this.getTimeBucket();
                return var2;
            }
        
            public int remoteHashCode() {
                byte var1 = 17;
                int var2 = 31 * var1 + this.entityId.hashCode();
                return var2;
            }
        
            public boolean equals(Object var1) {
                if (this == var1) {
                    return true;
                } else if (var1 == null) {
                    return false;
                } else if (this.getClass() != var1.getClass()) {
                    return false;
                } else {
                    InstanceJvmOldGcCountMetrics var2 = (InstanceJvmOldGcCountMetrics)var1;
                    if (!this.entityId.equals(var2.entityId)) {
                        return false;
                    } else {
                        return this.getTimeBucket() == var2.getTimeBucket();
                    }
                }
            }
        
            public Builder serialize() {
                Builder var1 = RemoteData.newBuilder();
                var1.addDataStrings(this.getEntityId());
                var1.addDataStrings(this.getServiceId());
                var1.addDataLongs(this.getValue());
                var1.addDataLongs(this.getTimeBucket());
                return var1;
            }
        
            public void deserialize(RemoteData var1) {
                this.setEntityId(var1.getDataStrings(0));
                this.setServiceId(var1.getDataStrings(1));
                this.setValue(var1.getDataLongs(0));
                this.setTimeBucket(var1.getDataLongs(1));
            }
        
            public MetricsMetaInfo getMeta() {
                return new MetricsMetaInfo("instance_jvm_old_gc_count", 11, this.entityId);
            }
        
            public Metrics toHour() {
                InstanceJvmOldGcCountMetrics var1 = new InstanceJvmOldGcCountMetrics();
                var1.setEntityId(this.getEntityId());
                var1.setServiceId(this.getServiceId());
                var1.setValue(this.getValue());
                var1.setTimeBucket(this.toTimeBucketInHour());
                return var1;
            }
        
            public Metrics toDay() {
                InstanceJvmOldGcCountMetrics var1 = new InstanceJvmOldGcCountMetrics();
                var1.setEntityId(this.getEntityId());
                var1.setServiceId(this.getServiceId());
                var1.setValue(this.getValue());
                var1.setTimeBucket(this.toTimeBucketInDay());
                return var1;
            }
        }
 

接收数据

客户端和服务端采用的是gRPC通信,对应的proto文件是JVMMetric.proto:
syntax = "proto3";

package skywalking.v3;

option java_multiple_files = true;
option java_package = "org.apache.skywalking.apm.network.language.agent.v3";
option csharp_namespace = "SkyWalking.NetworkProtocol.V3";
option go_package = "skywalking.apache.org/repo/goapi/collect/language/agent/v3";

import "common/Common.proto";

// Define the JVM metrics report service.
service JVMMetricReportService {
    rpc collect (JVMMetricCollection) returns (Commands) {
    }
}

message JVMMetricCollection {
    repeated JVMMetric metrics = 1;
    string service = 2;
    string serviceInstance = 3;
}

message JVMMetric {
    int64 time = 1;
    CPU cpu = 2;
    repeated Memory memory = 3;
    repeated MemoryPool memoryPool = 4;
    repeated GC gc = 5;
    Thread thread = 6;
    Class clazz = 7;
}

message Memory {
    bool isHeap = 1;
    int64 init = 2;
    int64 max = 3;
    int64 used = 4;
    int64 committed = 5;
}

message MemoryPool {
    PoolType type = 1;
    int64 init = 2;
    int64 max = 3;
    int64 used = 4;
    int64 committed = 5;
}

enum PoolType {
    CODE_CACHE_USAGE = 0;
    NEWGEN_USAGE = 1;
    OLDGEN_USAGE = 2;
    SURVIVOR_USAGE = 3;
    PERMGEN_USAGE = 4;
    METASPACE_USAGE = 5;
}

message GC {
    GCPhase phase = 1;
    int64 count = 2;
    int64 time = 3;
}

enum GCPhase {
    NEW = 0;
    OLD = 1;
    NORMAL = 2; // The type of GC doesn't have new and old phases, like Z Garbage Collector (ZGC)
}

// See: https://docs.oracle.com/javase/8/docs/api/java/lang/management/ThreadMXBean.html
message Thread {
    int64 liveCount = 1;
    int64 daemonCount = 2;
    int64 peakCount = 3;
    int64 runnableStateThreadCount = 4;
    int64 blockedStateThreadCount = 5;
    int64 waitingStateThreadCount = 6;
    int64 timedWaitingStateThreadCount = 7;
}

// See: https://docs.oracle.com/javase/8/docs/api/java/lang/management/ClassLoadingMXBean.html
message Class {
    int64 loadedClassCount = 1;
    int64 totalUnloadedClassCount = 2;
    int64 totalLoadedClassCount = 3;
}
一个请求实例
metrics{
    time: 1652800359303cpu{
        usagePercent: 0.03829805239617419
    }memory{
        isHeap: trueinit: 536870912max: 7635730432used: 301977016committed: 850395136
    }memory{
        init: 2555904max: -1used: 81238712committed: 84606976
    }memoryPool{
        init: 2555904max: 251658240used: 22219264committed: 22478848
    }memoryPool{
        type: METASPACE_USAGEmax: -1used: 52060512committed: 54657024
    }memoryPool{
        type: PERMGEN_USAGEmax: 1073741824used: 6958936committed: 7471104
    }memoryPool{
        type: NEWGEN_USAGEinit: 134742016max: 2845310976used: 276025736committed: 588251136
    }memoryPool{
        type: SURVIVOR_USAGEinit: 22020096max: 11010048used: 10771496committed: 11010048
    }memoryPool{
        type: OLDGEN_USAGEinit: 358088704max: 5726797824used: 15179784committed: 251133952
    }gc{
        
    }gc{
        phase: OLD
    }thread{
        liveCount: 45daemonCount: 41peakCount: 46runnableStateThreadCount: 17waitingStateThreadCount: 14timedWaitingStateThreadCount: 14
    }clazz{
        loadedClassCount: 9685totalLoadedClassCount: 9685
    }
}service: "Your_ApplicationName"serviceInstance: "d7a7de5f385149dfb49b8d23e8b6fbc9@10.4.77.148"

  1. 接收数据的入口是:org.apache.skywalking.oap.server.receiver.jvm.provider.handler.JVMMetricReportServiceHandler#collect,该方法主要完成以下工作:接收JVMMetricCollection并将其转换成Builder,遍历其中的Metrics,并调用jvmSourceDispatcher.sendMetric把数据发送到内存对列中
    @Override
    public void collect(JVMMetricCollection request, StreamObserver<Commands> responseObserver) {
        if (log.isDebugEnabled()) {
            log.debug(
                "receive the jvm metrics from service instance, name: {}, instance: {}",
                request.getService(),
                request.getServiceInstance()
            );
        }
        final JVMMetricCollection.Builder builder = request.toBuilder();
        builder.setService(namingControl.formatServiceName(builder.getService()));
        builder.setServiceInstance(namingControl.formatInstanceName(builder.getServiceInstance()));
    
        builder.getMetricsList().forEach(jvmMetric -> {
            jvmSourceDispatcher.sendMetric(builder.getService(), builder.getServiceInstance(), jvmMetric);
        });
    
        responseObserver.onNext(Commands.newBuilder().build());
        responseObserver.onCompleted();
    }
  2. jvmSourceDispatcher.sendMetric会调用SourceReceiverImpl的receive方法,该方法会从dispatcherMap中根据Source的Scope获取出对应的Dispatcher(JVM相关指标的的Dispatcher是通过OAL动态生成的,上文中已介绍)
  3. 通过上文展示的ServiceInstanceJVMGCDispatcher.class中可以看到,最终会调用MetricsStreamProcessor.java的in方法,最终将数据传入到自定义的阻塞队列org.apache.skywalking.oap.server.library.datacarrier.buffer.Channels(基于ArrayBlockingQueue封装的),此时服务端接收数据的流程基本结束
 

处理数据

Server端的数据处理环节主要是把内存队列中的数据持久化到存储系统。

  1. 在上文中提到的OAL动态生成类后,会调用MetricsStreamProcessor的create方法会为每个指标创建工作任务和工作流,其中就包含了三种类型的MetricsPersistentWorker,分别每分钟、小时和天进行一次持久化;同时会调用modelSetter.add通过通知建表监听任务完成数据库表的创建(根据动态生成的类及其父类的字段建表)
    public void create(ModuleDefineHolder moduleDefineHolder,
                       StreamDefinition stream,
                       Class<? extends Metrics> metricsClass) throws StorageException {
        final StorageBuilderFactory storageBuilderFactory = moduleDefineHolder.find(StorageModule.NAME)
                                                                              .provider()
                                                                              .getService(StorageBuilderFactory.class);
        final Class<? extends StorageBuilder> builder = storageBuilderFactory.builderOf(
            metricsClass, stream.getBuilder());
    
        StorageDAO storageDAO = moduleDefineHolder.find(StorageModule.NAME).provider().getService(StorageDAO.class);
        IMetricsDAO metricsDAO;
        try {
            metricsDAO = storageDAO.newMetricsDao(builder.getDeclaredConstructor().newInstance());
        } catch (InstantiationException | IllegalAccessException | NoSuchMethodException | InvocationTargetException e) {
            throw new UnexpectedException("Create " + stream.getBuilder().getSimpleName() + " metrics DAO failure.", e);
        }
    
        ModelCreator modelSetter = moduleDefineHolder.find(CoreModule.NAME).provider().getService(ModelCreator.class);
        DownSamplingConfigService configService = moduleDefineHolder.find(CoreModule.NAME)
                                                                    .provider()
                                                                    .getService(DownSamplingConfigService.class);
    
        MetricsPersistentWorker hourPersistentWorker = null;
        MetricsPersistentWorker dayPersistentWorker = null;
    
        MetricsTransWorker transWorker = null;
    
        final MetricsExtension metricsExtension = metricsClass.getAnnotation(MetricsExtension.class);
        /**
         * All metrics default are `supportDownSampling` and `insertAndUpdate`, unless it has explicit definition.
         */
        boolean supportDownSampling = true;
        boolean supportUpdate = true;
        boolean timeRelativeID = true;
        if (metricsExtension != null) {
            supportDownSampling = metricsExtension.supportDownSampling();
            supportUpdate = metricsExtension.supportUpdate();
            timeRelativeID = metricsExtension.timeRelativeID();
        }
        if (supportDownSampling) {
            if (configService.shouldToHour()) {
                Model model = modelSetter.add(
                    metricsClass, stream.getScopeId(), new Storage(stream.getName(), timeRelativeID, DownSampling.Hour),
                    false
                );
                hourPersistentWorker = downSamplingWorker(moduleDefineHolder, metricsDAO, model, supportUpdate);
            }
            if (configService.shouldToDay()) {
                Model model = modelSetter.add(
                    metricsClass, stream.getScopeId(), new Storage(stream.getName(), timeRelativeID, DownSampling.Day),
                    false
                );
                dayPersistentWorker = downSamplingWorker(moduleDefineHolder, metricsDAO, model, supportUpdate);
            }
    
            transWorker = new MetricsTransWorker(
                moduleDefineHolder, hourPersistentWorker, dayPersistentWorker);
        }
    
        Model model = modelSetter.add(
            metricsClass, stream.getScopeId(), new Storage(stream.getName(), timeRelativeID, DownSampling.Minute),
            false
        );
        MetricsPersistentWorker minutePersistentWorker = minutePersistentWorker(
            moduleDefineHolder, metricsDAO, model, transWorker, supportUpdate);
    
        String remoteReceiverWorkerName = stream.getName() + "_rec";
        IWorkerInstanceSetter workerInstanceSetter = moduleDefineHolder.find(CoreModule.NAME)
                                                                       .provider()
                                                                       .getService(IWorkerInstanceSetter.class);
        workerInstanceSetter.put(remoteReceiverWorkerName, minutePersistentWorker, metricsClass);
    
        MetricsRemoteWorker remoteWorker = new MetricsRemoteWorker(moduleDefineHolder, remoteReceiverWorkerName);
        MetricsAggregateWorker aggregateWorker = new MetricsAggregateWorker(
            moduleDefineHolder, remoteWorker, stream.getName(), l1FlushPeriod);
    
        entryWorkers.put(metricsClass, aggregateWorker);
    }
  2. MetricsPersistentWorker会为每种计算类型的指标(比如METRICS_L2_AGGREGATION)创建一个ConsumerPoolFactory,并为每种类型的指标创建一个PersistentConsumer和DataCarrier<Metrics>(封装了暂存Channel的内存队列)

  3. DataCarrier的consume方法为DataCarrier中的Channel添加一个ConsumerPool去消费,DataCarrier的consume方法会调用ConsumerPool的实现类BulkConsumePool的begin方法,启动所有的Consumer
    MetricsPersistentWorker(ModuleDefineHolder moduleDefineHolder, Model model, IMetricsDAO metricsDAO,
                            AbstractWorker<Metrics> nextAlarmWorker, AbstractWorker<ExportEvent> nextExportWorker,
                            MetricsTransWorker transWorker, boolean enableDatabaseSession, boolean supportUpdate,
                            long storageSessionTimeout, int metricsDataTTL) {
        super(moduleDefineHolder, new ReadWriteSafeCache<>(new MergableBufferedData(), new MergableBufferedData()));
        this.model = model;
        this.context = new HashMap<>(100);
        this.enableDatabaseSession = enableDatabaseSession;
        this.metricsDAO = metricsDAO;
        this.nextAlarmWorker = Optional.ofNullable(nextAlarmWorker);
        this.nextExportWorker = Optional.ofNullable(nextExportWorker);
        this.transWorker = Optional.ofNullable(transWorker);
        this.supportUpdate = supportUpdate;
        this.sessionTimeout = storageSessionTimeout;
        this.persistentCounter = 0;
        this.persistentMod = 1;
        this.metricsDataTTL = metricsDataTTL;
        this.skipDefaultValueMetric = true;
    
        String name = "METRICS_L2_AGGREGATION";
        int size = BulkConsumePool.Creator.recommendMaxSize() / 8;
        if (size == 0) {
            size = 1;
        }
        BulkConsumePool.Creator creator = new BulkConsumePool.Creator(name, size, 20);
        try {
            ConsumerPoolFactory.INSTANCE.createIfAbsent(name, creator);
        } catch (Exception e) {
            throw new UnexpectedException(e.getMessage(), e);
        }
    
        this.dataCarrier = new DataCarrier<>("MetricsPersistentWorker." + model.getName(), name, 1, 2000);
        this.dataCarrier.consume(ConsumerPoolFactory.INSTANCE.get(name), new PersistentConsumer());
    
        MetricsCreator metricsCreator = moduleDefineHolder.find(TelemetryModule.NAME)
                                                          .provider()
                                                          .getService(MetricsCreator.class);
        aggregationCounter = metricsCreator.createCounter(
            "metrics_aggregation", "The number of rows in aggregation",
            new MetricsTag.Keys("metricName", "level", "dimensionality"),
            new MetricsTag.Values(model.getName(), "2", model.getDownsampling().getName())
        );
        skippedMetricsCounter = metricsCreator.createCounter(
            "metrics_persistence_skipped", "The number of metrics skipped in persistence due to be in default value",
            new MetricsTag.Keys("metricName", "dimensionality"),
            new MetricsTag.Values(model.getName(), model.getDownsampling().getName())
        );
        SESSION_TIMEOUT_OFFSITE_COUNTER++;
    }
    
    
    /**
     * set consumeDriver to this Carrier. consumer begin to run when {@link DataCarrier#produce} begin to work.
     *
     * @param consumer single instance of consumer, all consumer threads will all use this instance.
     * @param num      number of consumer threads
     */
    public DataCarrier consume(IConsumer<T> consumer, int num, long consumeCycle) {
        if (driver != null) {
            driver.close(channels);
        }
        driver = new ConsumeDriver<T>(this.name, this.channels, consumer, num, consumeCycle);
        driver.begin(channels);
        return this;
    }
  4. PersistentConsumer将消费对应Channel中的数据,并暂存到ReadWriteSafeCache中
  5. CoreModule启动时会加载PersistenceTimer,PersistenceTimer会启动一个线程池,线程池中线程执行的方法会:
     
    1. 获取MetricsStreamProcessor.getInstance().getPersistentWorkers()的所有PersistentWorkers,包括上面创建的MetricsPersistentWorker
    2. 调用MetricsPersistentWorker的worker.buildBatchRequests()方法创建批量持久化的请求innerPrepareRequests,buildBatchRequests会读取ReadWriteSafeCache中的数据;
    3. 调用H2BatchDAO类型的对象batchDAO的flush(innerPrepareRequests)完成持久化
 

数据存储模型

以存储运行状态的Java线程为例,MySQL建表语句
CREATE TABLE `instance_jvm_thread_runnable_state_thread_count` (
  `id` varchar(512) NOT NULL, /* id = time_bucket + "_" + entity_id */
  `entity_id` varchar(512) DEFAULT NULL,/* 根据service_id 和 serviece_instance_id 生成*/
  `service_id` varchar(256) DEFAULT NULL, /* 根据客户端配置的service name生成 */
  `summation` bigint DEFAULT NULL,
  `count` bigint DEFAULT NULL,
  `value_` bigint DEFAULT NULL, /* value = (int)summation/count */
  `time_bucket` bigint DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `INSTANCE_JVM_THREAD_RUNNABLE_STATE_THREAD_COUNT_0_IDX` (`value_`),
  KEY `INSTANCE_JVM_THREAD_RUNNABLE_STATE_THREAD_COUNT_1_IDX` (`time_bucket`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
 

前端数据查询

使用开发者工具,可以看到前端获取JVM指标是通过/graphql接口,进一步查看代码,发现用的是Armeria框架和GraphQL,在初步看了上面两个技术的介绍后,总结出前端数据查询的大概流程:
  1. Server端启动时,当CoreModule及其依赖初始化后,会调用UITemplateInitializer(getManager()).initAll()完成UI模版的加载,UI模版的位置是:oap-server/server-starter/src/main/resources/ui-initialized-templates
  2. 同时也会使用SPI机制加载GraphQLQueryProvider,并调用期prepare和start方法完成查询模块的初始化
  3. 前端发送的GraphQL被解析到指定的服务,比如MetricsQuery的readMetricsValues方法,该方法会最终调用对应的DAO层代码(根据不同数据库),以MySQL为例,最终调用H2MetricsQueryDAO的readMetricsValue,进行SQL语句的拼接和执行
 
前端查询示例:
// 请求
{
  "query": "query queryData($duration: Duration!,$condition0: MetricsCondition!,$condition1: MetricsCondition!,$condition2: MetricsCondition!,$condition3: MetricsCondition!) {instance_jvm_memory_noheap_max0: readMetricsValues(condition: $condition0, duration: $duration){\n    label\n    values {\n      values {value}\n    }\n  },instance_jvm_memory_noheap1: readMetricsValues(condition: $condition1, duration: $duration){\n    label\n    values {\n      values {value}\n    }\n  },instance_jvm_memory_heap2: readMetricsValues(condition: $condition2, duration: $duration){\n    label\n    values {\n      values {value}\n    }\n  },instance_jvm_memory_heap_max3: readMetricsValues(condition: $condition3, duration: $duration){\n    label\n    values {\n      values {value}\n    }\n  }}",
  "variables": {
    "duration": {
      "start": "2022-05-18 2041",
      "end": "2022-05-18 2111",
      "step": "MINUTE"
    },
    "condition0": {
      "name": "instance_jvm_memory_noheap_max",
      "entity": {
        "scope": "ServiceInstance",
        "serviceName": "Your_ApplicationName",
        "normal": true,
        "serviceInstanceName": "d7a7de5f385149dfb49b8d23e8b6fbc9@10.4.77.148"
      }
    },
    "condition1": {
      "name": "instance_jvm_memory_noheap",
      "entity": {
        "scope": "ServiceInstance",
        "serviceName": "Your_ApplicationName",
        "normal": true,
        "serviceInstanceName": "d7a7de5f385149dfb49b8d23e8b6fbc9@10.4.77.148"
      }
    },
    "condition2": {
      "name": "instance_jvm_memory_heap",
      "entity": {
        "scope": "ServiceInstance",
        "serviceName": "Your_ApplicationName",
        "normal": true,
        "serviceInstanceName": "d7a7de5f385149dfb49b8d23e8b6fbc9@10.4.77.148"
      }
    },
    "condition3": {
      "name": "instance_jvm_memory_heap_max",
      "entity": {
        "scope": "ServiceInstance",
        "serviceName": "Your_ApplicationName",
        "normal": true,
        "serviceInstanceName": "d7a7de5f385149dfb49b8d23e8b6fbc9@10.4.77.148"
      }
    }
  }
}

// 响应
{
  "data": {
    "instance_jvm_memory_noheap_max0": {
      "label": null,
      "values": {
        "values": [
          {
            "value": 0
          }
          // 省略...
         
        ]
      }
    },
    "instance_jvm_memory_noheap1": {
      "label": null,
      "values": {
        "values": [
          {
            "value": 0
          },
         // 省略...
        ]
      }
    },
    "instance_jvm_memory_heap2": {
      "label": null,
      "values": {
        "values": [
          {
            "value": 0
          },
         // 省略...
          }
        ]
      }
    },
    "instance_jvm_memory_heap_max3": {
      "label": null,
      "values": {
        "values": [
          {
            "value": 0
          } // 省略...
        ]
      }
    }
  }
}
 

总结

  1. Agent端和Server端都适用了SPI机制完成相关模块的加载,为系统提供了很高的扩展性
  2. Agent端收集数据和上报数据的操作通过内存队列LinkedBlockingQueue解耦,可以避免网络通信堵塞造成的数据收集不全
  3. Agent端和Server端通过gRPC方式通信(go2sky上报trace信息也是如此)
  4. Server端也创建了大量的线程池和内存队列,用来接收数据、处理数据和持久化数据
  5. Server端处理数据用到了借助Antlr4定义的OAL语言,处理前端请求使用了GraphQL,后续需要对这两部分有更深入的理解
  6. SkyWalking的源码用到了大量的设计模式,比如观察者模式(各种listener)、单例模式(Enum实现)等,后续如果发现了某种设计模式,可以及时记录,后期可总结一下
  7. 更详细的流程还需要进一步阅读和调试代码,此文档会持续更新

 

posted @ 2022-08-22 18:14  easonyi  阅读(1610)  评论(1编辑  收藏  举报