自建APM链路监控-skywalking(容器化部署集群)
1.skywalking是国内开发的一款APM链路监控工具,比较适合目前比较流行的微服务架构,我们目前也是由于全面微服务化,且使用k8s,于是自行搭建研究一下,不是大神,错误请指出,欢迎交流。
官方地址:https://skywalking.apache.org/
github: https://github.com/apache/skywalking
dockerhub:https://hub.docker.com/search?q=skywalking&type=image
2.环境准备:skywalking分为单机版和集群版,只有集群版才需要zk或者nacos做注册中心
2.1 k8s集群 #腾讯云TKE集群
2.2 es集群(7版本) #自行搭建,本文略
2.3 zk集群(3.5+版本) #自行搭建,本文略
2.4 skywalking-oap(8.8.1版本)、skywalking-ui(8.5.0版本)、skywalking-agent(java)(8.8.0版本)#我使用8.5.0以上版本的oap和ui,ui无法获取到数据,有能解决的大神可以留言交流一下。
3.准备yaml文件:配置的变量如果不懂,可以自行下载个非容器的包,自行看下skywalking的配置文件 skywalking下载地址:https://archive.apache.org/dist/skywalking/8.5.0/ 配置文件:config/application.yml
3.1 skywalking的cm
apiVersion: v1 kind: ConfigMap metadata: name: skywalking-cm namespace: skywalking data: SW_CLUSTER: zookeeper SW_CLUSTER_ZK_HOST_PORT: '10.2.0.10:2181,10.2.0.11:2181,10.2.0.12:2181' #配置的zk的集群,可以写一个地址 SW_CORE_GRPC_PORT: "11800" #agent上报数据的 SW_CORE_REST_PORT: "12800" #ui调用获取数据的 SW_NAMESPACE: sw-dev #es和zk的namespace,es的索引会以该名称开头 SW_STORAGE: elasticsearch #数据存储类型,默认是h2,目前大多数选择使用es SW_STORAGE_ES_CLUSTER_NODES: '10.2.0.10:9200,10.2.0.11:9200,10.2.0.12:9200' #es集群地址,可写一个或者两个
3.2 skywalking-oap的yaml
apiVersion: apps/v1 kind: Deployment metadata: labels: app: skywalking-oap name: skywalking-oap namespace: skywalking spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: skywalking-oap strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: labels: app: skywalking-oap spec: containers: - envFrom: - configMapRef: name: skywalking-cm image: apache/skywalking-oap-server:8.8.1 imagePullPolicy: IfNotPresent name: skywalking-oap ports: - containerPort: 12800 name: http protocol: TCP - containerPort: 11800 name: grpc protocol: TCP resources: limits: cpu: "2" memory: 2Gi requests: cpu: "1" memory: 2Gi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/localtime name: volume-localtime dnsPolicy: ClusterFirst imagePullSecrets: - name: qcloudregistrykey restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 volumes: - hostPath: path: /etc/localtime type: "" name: volume-localtime observedGeneration: 12 readyReplicas: 1 replicas: 1 updatedReplicas: 1
---
3.3 skywalking-ui的yaml
---
apiVersion: v1 kind: Service metadata: labels: app: skywalking-ui-svc name: skywalking-ui-svc namespace: skywalking spec: ports: - name: http port: 8080 protocol: TCP targetPort: 8080 selector: app: skywalking-ui sessionAffinity: None type: ClusterIP
3.4 ui可以通过nodeport或者ingress暴露给外部访问,自行选择,我这边用的ingress
apiVersion: networking.k8s.io/v1beta1 kind: Ingress metadata: name: skywalking-dev-ui-ing namespace: skywalking spec: rules: - host: skywalking.xxx.com http: paths: - backend: serviceName: skywalking-ui-svc servicePort: 8080 path: /
3.5 发布yaml
kubectl apply -f ./*.yaml
3.6 访问ui:(数据需要配置好agent,并且有接口访问记录才会有)
3.7 java配置agent
由于我是容器化部署,也是针对我们的服务做的改造,目前有两种方式部署agent,一种是sidcar模式,一种是将agent打进服务底包,作为基础包使用,由于我们后面要生产要使用,所以我这边使用第二种方式,将agent打包进基础包
#Dockerfile FROM ccr.ccs.tencentyun.com/pet-dev/openjdk8:312 #这个是我们之前用的jdk环境包,自行修改 ENV TZ=Asia/Shanghai ADD ./skywalking-agent/ /skywalkingAgent #这个是下载的skywalking的java agent RUN ln -sf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone WORKDIR /start ADD ./start-0.0.1-SNAPSHOT.jar ./app.jar ENTRYPOINT ["java", "-javaagent:/skywalkingAgent/skywalking-agent.jar", "-jar", "-server", "/start/app.jar"]
打个镜像,然后放到集群部署:
apiVersion: apps/v1 kind: Deployment metadata: labels: k8s-app: sky-test qcloud-app: sky-test name: sky-test namespace: skywalking spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: k8s-app: sky-test qcloud-app: sky-test strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 0 type: RollingUpdate template: metadata: labels: k8s-app: sky-test qcloud-app: sky-test spec: containers: - env: - name: PATH value: /opt/java/openjdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin - name: LANG value: en_US.UTF-8 - name: LANGUAGE value: en_US:en - name: LC_ALL value: en_US.UTF-8 - name: JAVA_VERSION value: jdk8u312-b07 - name: JAVA_HOME value: /opt/java/openjdk - name: TZ value: Asia/Shanghai - name: JAVA_OPTS value: -Xms128m -Xmx256m -Djava.security.egd=file:/dev/./urandom - name: SW_AGENT_NAME #必传,ui界面展示的服务名称 value: demo-swdev - name: SW_AGENT_COLLECTOR_BACKEND_SERVICES #必传,连接的oap服务地址 value: skywalking-oap-svc:11800 #oap的svc+port image: ccr.ccs.tencentyun.com/pet-dev/demo:1.1 #腾讯云的镜像仓库 imagePullPolicy: IfNotPresent name: sky-test resources: limits: cpu: 500m memory: 1Gi requests: cpu: 250m memory: 256Mi securityContext: privileged: false terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst imagePullSecrets: - name: qcloudregistrykey restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30