微服务组件-----Spring Cloud Alibaba 注册中心 Nacos源码(1.4.x版本)分析
核心功能点
【1】服务注册:Nacos Client会通过发送REST请求的方式向Nacos Server注册自己的服务,提供自身的元数据,比如ip地址、端口等信息。Nacos Server接收到注册请求后,就会把这些元数据信息存储在一个双层的内存Map中。
【2】服务心跳:在服务注册后,Nacos Client会维护一个定时心跳来持续通知Nacos Server,说明服务一直处于可用状态,防止被剔除。默认5s发送一次心跳。
【3】服务同步:Nacos Server集群之间会互相同步服务实例,用来保证服务信息的一致性。
【4】服务发现:服务消费者(Nacos Client)在调用服务提供者的服务时,会发送一个REST请求给Nacos Server,获取上面注册的服务清单,并且缓存在Nacos Client本地,同时会在Nacos Client本地开启一个定时任务定时拉取服务端最新的注册表信息更新到本地缓存
【5】服务健康检查:Nacos Server会开启一个定时任务用来检查注册服务实例的健康情况,对于超过15s没有收到客户端心跳的实例会将它的healthy属性置为false(客户端服务发现时不会发现),如果某个实例超过30秒没有收到心跳,直接剔除该实例(被剔除的实例如果恢复发送心跳则会重新注册)
源码精髓总结
【1】注册表的结构说明(这个仅是记录):
//Map<namespaceId, Map<service_name, Service>【ConcurrentSkipListMap】> private final Map<String, Map<String, Service>> serviceMap = new ConcurrentHashMap<>(); //再分析里面的Service,Map<clusterName, Cluster> private Map<String, Cluster> clusterMap = new HashMap<>(); //再分析Cluster private Set<Instance> persistentInstances = new HashSet<>(); private Set<Instance> ephemeralInstances = new HashSet<>();
【2】分析注册表为何要这么设计
1.注册表是基于第一层ConcurrentHashMap,第二层ConcurrentSkipListMap,第三层HashMap,然后定位到对应的Cluster。 2.至于为什么要这样设计,一方面是将粒度划分的更细,通过源码分析可知,nacos更新注册表是进行小范围的更新,如定位到Cluster的临时列表ephemeralInstances或者持久列表persistentInstances【这两个都是set集合,所以排除了会有重复的数据】。因为粒度小所以更新速度会更快。 3.其次采用的是 写时复制思想,也就是说,不会影响读取的效率,因为是新开一个副本,将新旧的数据合并到一个新数据里面,然后将引用指向新数据。 4.其次是为了高扩展,对namespace进行划分【对开发环境隔离】,对service进行划分【对服务进行隔离】,对Cluster进行划分【多机房部署,加快访问速度】 5.为了解决并发读写问题,采用的是ConcurrentHashMap与ConcurrentSkipListMap的分段锁,加上Cluster里面的写时复制。其次Cluster里面是不加锁的,因为是单线程进行修改,不存在冲突。 6.虽说牺牲了,一定的实时性,但是大大提高了并发的性能。
【3】分析AP架构下为什么高性能的原因
1.因为采用的是异步任务加队列的形式来实现注册的,所以响应很快,然后任务是慢慢做的。 2.Notifier 是在DistroConsistencyServiceImpl类中初始化,默认单线程,而且队列为ArrayBlockingQueue<>(1024 * 1024)。 3.缩小了变更数据的粒度,单线程避免了线程安全问题【不用加锁】。 4.这种方式毫无疑问是会存在问题的,就是响应了但是没有注册上。但是对于这个问题,在客户端里面做了心跳机制,如果检测不到会重新注册。
【4】分析Nacos为什么感知快的原因
采用的是客户端定时进行一次拉取,兼服务端采用异步的形式使用UDP发送更新的数据到客户端;
虽然UDP存在通知丢失的情况,但是每隔1s的拉取依旧能很好的保持数据的最终一致性。
源码分析
验证服务端
【1】在启动的时候我们一般是调用shell脚本启动,查看startup.sh脚本
从以下看实际上是调用了java命令启动了个java的项目(-jar ${BASE_DIR}/target/${SERVER}.jar 将参数对应替换后 -jar ${BASE_DIR}/target/nacos-server.jar)
去寻找启动入口的时候会发现,它其实是SpringBoot搭建的一个WEB服务。
cygwin=false darwin=false os400=false case "`uname`" in CYGWIN*) cygwin=true;; Darwin*) darwin=true;; OS400*) os400=true;; esac error_exit () { echo "ERROR: $1 !!" exit 1 } [ ! -e "$JAVA_HOME/bin/java" ] && JAVA_HOME=$HOME/jdk/java [ ! -e "$JAVA_HOME/bin/java" ] && JAVA_HOME=/usr/java [ ! -e "$JAVA_HOME/bin/java" ] && JAVA_HOME=/opt/taobao/java [ ! -e "$JAVA_HOME/bin/java" ] && unset JAVA_HOME if [ -z "$JAVA_HOME" ]; then if $darwin; then if [ -x '/usr/libexec/java_home' ] ; then export JAVA_HOME=`/usr/libexec/java_home` elif [ -d "/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home" ]; then export JAVA_HOME="/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home" fi else JAVA_PATH=`dirname $(readlink -f $(which javac))` if [ "x$JAVA_PATH" != "x" ]; then export JAVA_HOME=`dirname $JAVA_PATH 2>/dev/null` fi fi if [ -z "$JAVA_HOME" ]; then error_exit "Please set the JAVA_HOME variable in your environment, We need java(x64)! jdk8 or later is better!" fi fi export SERVER="nacos-server" export MODE="cluster" export FUNCTION_MODE="all" export MEMBER_LIST="" export EMBEDDED_STORAGE="" while getopts ":m:f:s:c:p:" opt do case $opt in m) MODE=$OPTARG;; f) FUNCTION_MODE=$OPTARG;; s) SERVER=$OPTARG;; c) MEMBER_LIST=$OPTARG;; p) EMBEDDED_STORAGE=$OPTARG;; ?) echo "Unknown parameter" exit 1;; esac done export JAVA_HOME export JAVA="$JAVA_HOME/bin/java" export BASE_DIR=`cd $(dirname $0)/..; pwd` export CUSTOM_SEARCH_LOCATIONS=file:${BASE_DIR}/conf/ #=========================================================================================== # JVM Configuration #=========================================================================================== if [[ "${MODE}" == "standalone" ]]; then JAVA_OPT="${JAVA_OPT} -Xms512m -Xmx512m -Xmn256m" JAVA_OPT="${JAVA_OPT} -Dnacos.standalone=true" else if [[ "${EMBEDDED_STORAGE}" == "embedded" ]]; then JAVA_OPT="${JAVA_OPT} -DembeddedStorage=true" fi JAVA_OPT="${JAVA_OPT} -server -Xms2g -Xmx2g -Xmn1g -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=320m" JAVA_OPT="${JAVA_OPT} -XX:-OmitStackTraceInFastThrow -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${BASE_DIR}/logs/java_heapdump.hprof" JAVA_OPT="${JAVA_OPT} -XX:-UseLargePages" fi if [[ "${FUNCTION_MODE}" == "config" ]]; then JAVA_OPT="${JAVA_OPT} -Dnacos.functionMode=config" elif [[ "${FUNCTION_MODE}" == "naming" ]]; then JAVA_OPT="${JAVA_OPT} -Dnacos.functionMode=naming" fi JAVA_OPT="${JAVA_OPT} -Dnacos.member.list=${MEMBER_LIST}" JAVA_MAJOR_VERSION=$($JAVA -version 2>&1 | sed -E -n 's/.* version "([0-9]*).*$/\1/p') if [[ "$JAVA_MAJOR_VERSION" -ge "9" ]] ; then JAVA_OPT="${JAVA_OPT} -Xlog:gc*:file=${BASE_DIR}/logs/nacos_gc.log:time,tags:filecount=10,filesize=102400" else JAVA_OPT="${JAVA_OPT} -Djava.ext.dirs=${JAVA_HOME}/jre/lib/ext:${JAVA_HOME}/lib/ext" JAVA_OPT="${JAVA_OPT} -Xloggc:${BASE_DIR}/logs/nacos_gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M" fi JAVA_OPT="${JAVA_OPT} -Dloader.path=${BASE_DIR}/plugins/health,${BASE_DIR}/plugins/cmdb" JAVA_OPT="${JAVA_OPT} -Dnacos.home=${BASE_DIR}" JAVA_OPT="${JAVA_OPT} -jar ${BASE_DIR}/target/${SERVER}.jar" JAVA_OPT="${JAVA_OPT} ${JAVA_OPT_EXT}" JAVA_OPT="${JAVA_OPT} --spring.config.additional-location=${CUSTOM_SEARCH_LOCATIONS}" JAVA_OPT="${JAVA_OPT} --logging.config=${BASE_DIR}/conf/nacos-logback.xml" JAVA_OPT="${JAVA_OPT} --server.max-http-header-size=524288" if [ ! -d "${BASE_DIR}/logs" ]; then mkdir ${BASE_DIR}/logs fi echo "$JAVA ${JAVA_OPT}" if [[ "${MODE}" == "standalone" ]]; then echo "nacos is starting with standalone" else echo "nacos is starting with cluster" fi # check the start.out log output file if [ ! -f "${BASE_DIR}/logs/start.out" ]; then touch "${BASE_DIR}/logs/start.out" fi # start echo "$JAVA ${JAVA_OPT}" > ${BASE_DIR}/logs/start.out 2>&1 & nohup $JAVA ${JAVA_OPT} nacos.nacos >> ${BASE_DIR}/logs/start.out 2>&1 & echo "nacos is starting,you can check the ${BASE_DIR}/logs/start.out"
从客户端开始分析
【1】根据自动装配原理(寻找spring.factories文件配置)
org.springframework.boot.autoconfigure.EnableAutoConfiguration=\ com.alibaba.cloud.nacos.NacosDiscoveryAutoConfiguration,\ com.alibaba.cloud.nacos.ribbon.RibbonNacosAutoConfiguration,\ com.alibaba.cloud.nacos.endpoint.NacosDiscoveryEndpointAutoConfiguration,\ com.alibaba.cloud.nacos.discovery.NacosDiscoveryClientAutoConfiguration,\ com.alibaba.cloud.nacos.discovery.configclient.NacosConfigServerAutoConfiguration org.springframework.cloud.bootstrap.BootstrapConfiguration=\ com.alibaba.cloud.nacos.discovery.configclient.NacosDiscoveryClientConfigServiceBootstrapConfiguration
【2】分析NacosDiscoveryAutoConfiguration类自动装配了什么
@Configuration @EnableConfigurationProperties @ConditionalOnNacosDiscoveryEnabled @ConditionalOnProperty(value = "spring.cloud.service-registry.auto-registration.enabled", matchIfMissing = true) @AutoConfigureAfter({ AutoServiceRegistrationConfiguration.class, AutoServiceRegistrationAutoConfiguration.class }) public class NacosDiscoveryAutoConfiguration { @Bean public NacosServiceRegistry nacosServiceRegistry( NacosDiscoveryProperties nacosDiscoveryProperties) { return new NacosServiceRegistry(nacosDiscoveryProperties); } @Bean @ConditionalOnBean(AutoServiceRegistrationProperties.class) public NacosRegistration nacosRegistration( NacosDiscoveryProperties nacosDiscoveryProperties, ApplicationContext context) { return new NacosRegistration(nacosDiscoveryProperties, context); } //可以看出是将上面两个Bean当做参数传入了这个Bean @Bean @ConditionalOnBean(AutoServiceRegistrationProperties.class) public NacosAutoServiceRegistration nacosAutoServiceRegistration( NacosServiceRegistry registry, AutoServiceRegistrationProperties autoServiceRegistrationProperties, NacosRegistration registration) { return new NacosAutoServiceRegistration(registry, autoServiceRegistrationProperties, registration); } }
【3】分析NacosAutoServiceRegistration类有什么重要性
利用监听机制,达到注册服务的目的。监听WebServer初始化事件
//class NacosAutoServiceRegistration extends AbstractAutoServiceRegistration<Registration> //abstract class AbstractAutoServiceRegistration<R extends Registration> implements AutoServiceRegistration, ApplicationContextAware, ApplicationListener<WebServerInitializedEvent> //因为继承了ApplicationListener,必然会有监听方法 public void onApplicationEvent(WebServerInitializedEvent event) { bind(event); } @Deprecated public void bind(WebServerInitializedEvent event) { ApplicationContext context = event.getApplicationContext(); if (context instanceof ConfigurableWebServerApplicationContext) { if ("management".equals(((ConfigurableWebServerApplicationContext) context).getServerNamespace())) { return; } } this.port.compareAndSet(0, event.getWebServer().getPort()); this.start(); } public void start() { if (!isEnabled()) {return; } // only initialize if nonSecurePort is greater than 0 and it isn't already running // because of containerPortInitializer below if (!this.running.get()) { this.context.publishEvent(new InstancePreRegisteredEvent(this, getRegistration())); register(); if (shouldRegisterManagement()) { registerManagement(); } this.context.publishEvent(new InstanceRegisteredEvent<>(this, getConfiguration())); this.running.compareAndSet(false, true); } } protected void register() { this.serviceRegistry.register(getRegistration()); } @Override public void register(Registration registration) { if (StringUtils.isEmpty(registration.getServiceId())) { return; } NamingService namingService = namingService(); String serviceId = registration.getServiceId(); String group = nacosDiscoveryProperties.getGroup(); Instance instance = getNacosInstanceFromRegistration(registration); try { namingService.registerInstance(serviceId, group, instance); } catch (Exception e) { // rethrow a RuntimeException if the registration is failed. // issue : https://github.com/alibaba/spring-cloud-alibaba/issues/1132 rethrowRuntimeException(e); } }
【4】分析如何注册的【服务注册】
//NacosNamingService类的registerInstance方法 @Override public void registerInstance(String serviceName, Instance instance) throws NacosException { registerInstance(serviceName, Constants.DEFAULT_GROUP, instance); } //NacosNamingService类#registerInstance方法 @Override public void registerInstance(String serviceName, String groupName, Instance instance) throws NacosException { NamingUtils.checkInstanceIsLegal(instance); String groupedServiceName = NamingUtils.getGroupedName(serviceName, groupName); if (instance.isEphemeral()) { BeatInfo beatInfo = beatReactor.buildBeatInfo(groupedServiceName, instance);
//添加一个延时执行的心跳任务 beatReactor.addBeatInfo(groupedServiceName, beatInfo); }
//进行服务注册 serverProxy.registerService(groupedServiceName, groupName, instance); } //NamingProxy类#registerService方法 public void registerService(String serviceName, String groupName, Instance instance) throws NacosException { //构建注册参数 final Map<String, String> params = new HashMap<String, String>(16); params.put(CommonParams.NAMESPACE_ID, namespaceId); params.put(CommonParams.SERVICE_NAME, serviceName); params.put(CommonParams.GROUP_NAME, groupName); params.put(CommonParams.CLUSTER_NAME, instance.getClusterName()); params.put("ip", instance.getIp()); params.put("port", String.valueOf(instance.getPort())); params.put("weight", String.valueOf(instance.getWeight())); params.put("enable", String.valueOf(instance.isEnabled())); params.put("healthy", String.valueOf(instance.isHealthy())); params.put("ephemeral", String.valueOf(instance.isEphemeral())); params.put("metadata", JacksonUtils.toJson(instance.getMetadata())); //向服务端发送请求 //UtilAndComs.nacosUrlInstance=/nacos/v1/ns/instance 也就是官网所示的注册接口地址 reqApi(UtilAndComs.nacosUrlInstance, params, HttpMethod.POST); } public String reqApi(String api, Map<String, String> params, String method) throws NacosException { return reqApi(api, params, Collections.EMPTY_MAP, method); } public String reqApi(String api, Map<String, String> params, Map<String, String> body, String method) throws NacosException { return reqApi(api, params, body, getServerList(), method); } public String reqApi(String api, Map<String, String> params, Map<String, String> body, List<String> servers, String method) throws NacosException { params.put(CommonParams.NAMESPACE_ID, getNamespaceId()); if (CollectionUtils.isEmpty(servers) && StringUtils.isBlank(nacosDomain)) { throw new NacosException(...); } NacosException exception = new NacosException(); if (StringUtils.isNotBlank(nacosDomain)) { for (int i = 0; i < maxRetry; i++) { try { return callServer(api, params, body, nacosDomain, method); } catch (NacosException e) { exception = e; } } } else { Random random = new Random(System.currentTimeMillis()); int index = random.nextInt(servers.size()); for (int i = 0; i < servers.size(); i++) { String server = servers.get(index); try { return callServer(api, params, body, server, method); } catch (NacosException e) { exception = e; } index = (index + 1) % servers.size(); } } throw new NacosException(...); } public String callServer(String api, Map<String, String> params, Map<String, String> body, String curServer, String method) throws NacosException { long start = System.currentTimeMillis(); long end = 0; injectSecurityInfo(params); Header header = builderHeader(); String url; if (curServer.startsWith(UtilAndComs.HTTPS) || curServer.startsWith(UtilAndComs.HTTP)) { url = curServer + api; } else { if (!IPUtil.containsPort(curServer)) { curServer = curServer + IPUtil.IP_PORT_SPLITER + serverPort; } url = NamingHttpClientManager.getInstance().getPrefix() + curServer + api; } try { //真正远程调用 HttpRestResult<String> restResult = nacosRestTemplate .exchangeForm(url, header, Query.newInstance().initParams(params), body, method, String.class); end = System.currentTimeMillis(); MetricsMonitor.getNamingRequestMonitor(method, url, String.valueOf(restResult.getCode())).observe(end - start); if (restResult.ok()) { return restResult.getData(); } if (HttpStatus.SC_NOT_MODIFIED == restResult.getCode()) { return StringUtils.EMPTY; } throw new NacosException(restResult.getCode(), restResult.getMessage()); } catch (Exception e) { throw new NacosException(NacosException.SERVER_ERROR, e); } } public <T> HttpRestResult<T> exchangeForm(String url, Header header, Query query, Map<String, String> bodyValues, String httpMethod, Type responseType) throws Exception { RequestHttpEntity requestHttpEntity = new RequestHttpEntity( header.setContentType(MediaType.APPLICATION_FORM_URLENCODED), query, bodyValues); return execute(url, httpMethod, requestHttpEntity, responseType); } private <T> HttpRestResult<T> execute(String url, String httpMethod, RequestHttpEntity requestEntity, Type responseType) throws Exception { URI uri = HttpUtils.buildUri(url, requestEntity.getQuery()); ResponseHandler<T> responseHandler = super.selectResponseHandler(responseType); HttpClientResponse response = null; try { //使用JdkHttpClientRequest去发起请求 response = this.requestClient().execute(uri, httpMethod, requestEntity); return responseHandler.handle(response); } finally { if (response != null) { response.close(); } } } //JdkHttpClientRequest类#execute方法 @Override public HttpClientResponse execute(URI uri, String httpMethod, RequestHttpEntity requestHttpEntity) throws Exception { final Object body = requestHttpEntity.getBody(); final Header headers = requestHttpEntity.getHeaders(); replaceDefaultConfig(requestHttpEntity.getHttpClientConfig()); HttpURLConnection conn = (HttpURLConnection) uri.toURL().openConnection(); Map<String, String> headerMap = headers.getHeader(); if (headerMap != null && headerMap.size() > 0) { for (Map.Entry<String, String> entry : headerMap.entrySet()) { conn.setRequestProperty(entry.getKey(), entry.getValue()); } } conn.setConnectTimeout(this.httpClientConfig.getConTimeOutMillis()); conn.setReadTimeout(this.httpClientConfig.getReadTimeOutMillis()); conn.setRequestMethod(httpMethod); if (body != null && !"".equals(body)) { String contentType = headers.getValue(HttpHeaderConsts.CONTENT_TYPE); String bodyStr = JacksonUtils.toJson(body); if (MediaType.APPLICATION_FORM_URLENCODED.equals(contentType)) { Map<String, String> map = JacksonUtils.toObj(bodyStr, HashMap.class); bodyStr = HttpUtils.encodingParams(map, headers.getCharset()); } if (bodyStr != null) { conn.setDoOutput(true); byte[] b = bodyStr.getBytes(); conn.setRequestProperty("Content-Length", String.valueOf(b.length)); OutputStream outputStream = conn.getOutputStream(); outputStream.write(b, 0, b.length); outputStream.flush(); IoUtils.closeQuietly(outputStream); } } conn.connect(); return new JdkHttpClientResponse(conn); }
【5】beatReactor.addBeatInfo 心跳任务的流程【服务心跳】
//BeatReactor类#构造方法 public BeatReactor(NamingProxy serverProxy, int threadCount) { this.serverProxy = serverProxy; //定义延迟的线程池 this.executorService = new ScheduledThreadPoolExecutor(threadCount, new ThreadFactory() { @Override public Thread newThread(Runnable r) { Thread thread = new Thread(r); thread.setDaemon(true); thread.setName("com.alibaba.nacos.naming.beat.sender"); return thread; } }); } //添加任务方法 public void addBeatInfo(String serviceName, BeatInfo beatInfo) { NAMING_LOGGER.info("[BEAT] adding beat: {} to beat map.", beatInfo); String key = buildKey(serviceName, beatInfo.getIp(), beatInfo.getPort()); BeatInfo existBeat = null; //fix #1733 if ((existBeat = dom2Beat.remove(key)) != null) { existBeat.setStopped(true); } dom2Beat.put(key, beatInfo); //实际上就是往延迟的线程池添加任务 executorService.schedule(new BeatTask(beatInfo), beatInfo.getPeriod(), TimeUnit.MILLISECONDS); MetricsMonitor.getDom2BeatSizeMonitor().set(dom2Beat.size()); } //分析心跳任务类,主要都是run方法 //这种调用方式eureka中也是 class BeatTask implements Runnable { BeatInfo beatInfo; public BeatTask(BeatInfo beatInfo) { this.beatInfo = beatInfo; } @Override public void run() { if (beatInfo.isStopped()) { return; } long nextTime = beatInfo.getPeriod(); try { //调用server代理实例发送心跳接口 JsonNode result = serverProxy.sendBeat(beatInfo, BeatReactor.this.lightBeatEnabled); long interval = result.get("clientBeatInterval").asLong(); boolean lightBeatEnabled = false; if (result.has(CommonParams.LIGHT_BEAT_ENABLED)) { lightBeatEnabled = result.get(CommonParams.LIGHT_BEAT_ENABLED).asBoolean(); } BeatReactor.this.lightBeatEnabled = lightBeatEnabled; if (interval > 0) { nextTime = interval; } int code = NamingResponseCode.OK; if (result.has(CommonParams.CODE)) { code = result.get(CommonParams.CODE).asInt(); } //服务返回没有,则再次注册 if (code == NamingResponseCode.RESOURCE_NOT_FOUND) { Instance instance = new Instance(); instance.setPort(beatInfo.getPort()); instance.setIp(beatInfo.getIp()); instance.setWeight(beatInfo.getWeight()); instance.setMetadata(beatInfo.getMetadata()); instance.setClusterName(beatInfo.getCluster()); instance.setServiceName(beatInfo.getServiceName()); instance.setInstanceId(instance.getInstanceId()); instance.setEphemeral(true); try { //又是一个注册方法的调用 serverProxy.registerService(beatInfo.getServiceName(), NamingUtils.getGroupName(beatInfo.getServiceName()), instance); } catch (Exception ignore) { } } } catch (NacosException ex) {...} //方法内再次将任务塞入,形成循环调用 executorService.schedule(new BeatTask(beatInfo), nextTime, TimeUnit.MILLISECONDS); } } public JsonNode sendBeat(BeatInfo beatInfo, boolean lightBeatEnabled) throws NacosException { Map<String, String> params = new HashMap<String, String>(8); Map<String, String> bodyMap = new HashMap<String, String>(2); if (!lightBeatEnabled) { bodyMap.put("beat", JacksonUtils.toJson(beatInfo)); } params.put(CommonParams.NAMESPACE_ID, namespaceId); params.put(CommonParams.SERVICE_NAME, beatInfo.getServiceName()); params.put(CommonParams.CLUSTER_NAME, beatInfo.getCluster()); params.put("ip", beatInfo.getIp()); params.put("port", String.valueOf(beatInfo.getPort())); //地址为/nacos/v1/ns/instance/beat String result = reqApi(UtilAndComs.nacosUrlBase + "/instance/beat", params, bodyMap, HttpMethod.PUT); return JacksonUtils.toObj(result); }
【6】分析如何引入服务的【服务发现】
//NacosNamingService类#getAllInstances方法 @Override public List<Instance> getAllInstances(String serviceName, String groupName, List<String> clusters, boolean subscribe) throws NacosException { ServiceInfo serviceInfo; // 是否是订阅模式,默认是true if (subscribe) { // 先从客户端缓存获取服务信息 serviceInfo = hostReactor.getServiceInfo(NamingUtils.getGroupedName(serviceName, groupName), StringUtils.join(clusters, ",")); } else { // 如果本地缓存不存在服务信息,则进行订阅 serviceInfo = hostReactor.getServiceInfoDirectlyFromServer(NamingUtils.getGroupedName(serviceName, groupName), StringUtils.join(clusters, ",")); } List<Instance> list; // 从服务信息中获取实例列表 if (serviceInfo == null || CollectionUtils.isEmpty(list = serviceInfo.getHosts())) { return new ArrayList<Instance>(); } return list; }
【6.1】分析先从缓存中拿的hostReactor.getServiceInfo方法
//获取服务信息 public ServiceInfo getServiceInfo(final String serviceName, final String clusters) { NAMING_LOGGER.debug("failover-mode: " + failoverReactor.isFailoverSwitch()); String key = ServiceInfo.getKey(serviceName, clusters); if (failoverReactor.isFailoverSwitch()) { return failoverReactor.getService(key); } //获取服务的信息 ServiceInfo serviceObj = getServiceInfo0(serviceName, clusters); //客户端第一次获取这个注册表信息为空 if (null == serviceObj) { serviceObj = new ServiceInfo(serviceName, clusters); serviceInfoMap.put(serviceObj.getKey(), serviceObj); updatingMap.put(serviceName, new Object()); //会去拉取这个注册中心里面的注册表信息 updateServiceNow(serviceName, clusters); updatingMap.remove(serviceName); } //如果本地缓存里面已有这个注册表信息 else if (updatingMap.containsKey(serviceName)) { if (UPDATE_HOLD_INTERVAL > 0) { // hold a moment waiting for update finish synchronized (serviceObj) { try { serviceObj.wait(UPDATE_HOLD_INTERVAL); } catch (InterruptedException e) {...} } } } //客户端会开启一个定时任务,每隔几秒会去拉取注册中心里面的全部实例的信息 scheduleUpdateIfAbsent(serviceName, clusters); return serviceInfoMap.get(serviceObj.getKey()); } //HostReactor类# Map<String, ServiceInfo> serviceInfoMap属性【这个便是客户端保存实例数据的缓存所在】 //实际上是先从serviceInfoMap属性里面拿的 private ServiceInfo getServiceInfo0(String serviceName, String clusters) { String key = ServiceInfo.getKey(serviceName, clusters); return serviceInfoMap.get(key); }
【6.1.1】分析远程拉取流程updateServiceNow方法
private void updateServiceNow(String serviceName, String clusters) { try { updateService(serviceName, clusters); } catch (NacosException e) {...} } public void updateService(String serviceName, String clusters) throws NacosException { ServiceInfo oldService = getServiceInfo0(serviceName, clusters); try { //远程调用 String result = serverProxy.queryList(serviceName, clusters, pushReceiver.getUdpPort(), false); if (StringUtils.isNotEmpty(result)) { //处理并塞入serviceInfoMap,还会发送一个InstancesChangeEvent事件 processServiceJson(result); } } finally { if (oldService != null) { synchronized (oldService) { oldService.notifyAll(); } } } } public String queryList(String serviceName, String clusters, int udpPort, boolean healthyOnly) throws NacosException { final Map<String, String> params = new HashMap<String, String>(8); params.put(CommonParams.NAMESPACE_ID, namespaceId); params.put(CommonParams.SERVICE_NAME, serviceName); params.put("clusters", clusters); params.put("udpPort", String.valueOf(udpPort)); params.put("clientIP", NetUtils.localIP()); params.put("healthyOnly", String.valueOf(healthyOnly)); //调用服务的API,获取服务注册中心里面的全部实例 return reqApi(UtilAndComs.nacosUrlBase + "/instance/list", params, HttpMethod.GET); }
【6.1.1.1】分析定时任务scheduleUpdateIfAbsent方法做了什么
public void scheduleUpdateIfAbsent(String serviceName, String clusters) { if (futureMap.get(ServiceInfo.getKey(serviceName, clusters)) != null) { return; } synchronized (futureMap) { if (futureMap.get(ServiceInfo.getKey(serviceName, clusters)) != null) { return; } ScheduledFuture<?> future = addTask(new UpdateTask(serviceName, clusters)); futureMap.put(ServiceInfo.getKey(serviceName, clusters), future); } } //DEFAULT_DELAY = 1000L,也就是说是1s public synchronized ScheduledFuture<?> addTask(UpdateTask task) { return executor.schedule(task, DEFAULT_DELAY, TimeUnit.MILLISECONDS); } //分析UpdateTask类的run方法 @Override public void run() { long delayTime = DEFAULT_DELAY; try { // 根据serviceName获取到当前服务的信息,包括服务器地址列表 ServiceInfo serviceObj = serviceInfoMap.get(ServiceInfo.getKey(serviceName, clusters)); // 如果为空,则重新拉取最新的服务列表 if (serviceObj == null) { updateService(serviceName, clusters); return; } // 如果时间戳<=上次更新的时间,则进行更新操作 if (serviceObj.getLastRefTime() <= lastRefTime) { updateService(serviceName, clusters); serviceObj = serviceInfoMap.get(ServiceInfo.getKey(serviceName, clusters)); } else { // 如果serviceObj的refTime更晚, // 则表示服务通过主动push机制已被更新,这时我们只进行刷新操作 refreshOnly(serviceName, clusters); } // 刷新服务的更新时间 lastRefTime = serviceObj.getLastRefTime(); // 如果订阅被取消,则停止更新任务 if (!notifier.isSubscribed(serviceName, clusters) && !futureMap.containsKey(ServiceInfo.getKey(serviceName, clusters))) { return; } // 如果没有可供调用的服务列表,则统计失败次数+1 if (CollectionUtils.isEmpty(serviceObj.getHosts())) { incFailCount(); return; } // 设置延迟一段时间后进行查询 delayTime = serviceObj.getCacheMillis(); // 将失败查询次数重置为0 resetFailCount(); } catch (Throwable e) { incFailCount(); } finally { // 设置下一次查询任务的触发时间 // 默认是1s,按照失败次数翻倍,最大60s // 也就是【1,2,4,8,16,32,60】 executor.schedule(this, Math.min(delayTime << failCount, DEFAULT_DELAY * 60), TimeUnit.MILLISECONDS); } }
服务端分析
【1】分析nacos.naming.controllers包下的InstanceController
【1.1】分析注册方法
//RESTful的接口规范 @CanDistro @PostMapping @Secured(parser = NamingResourceParser.class, action = ActionTypes.WRITE) public String register(HttpServletRequest request) throws Exception { // 尝试获取namespaceId final String namespaceId = WebUtils.optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID); // 尝试获取serviceName,其格式为 group_name@@service_name final String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME); NamingUtils.checkServiceNameFormat(serviceName); // 解析出实例信息,封装为Instance对象 final Instance instance = parseInstance(request); // 注册实例 serviceManager.registerInstance(namespaceId, serviceName, instance); return "ok"; } //ServiceManager类#registerInstance方法 public void registerInstance(String namespaceId, String serviceName, Instance instance) throws NacosException { //判断本地缓存中是否存在该命名空间,如果不存在就创建,之后判断该命名空间下是否 //存在该服务,如果不存在就创建空的服务 //注意这里并没有更新服务的实例信息 createEmptyService(namespaceId, serviceName, instance.isEphemeral()); //从本地缓存中获取服务信息 Service service = getService(namespaceId, serviceName); if (service == null) { throw new NacosException(...); } //服务注册,这一步才会把服务的实例信息和服务绑定起来 addInstance(namespaceId, serviceName, instance.isEphemeral(), instance); }
【1.1.1】分析createEmptyService方法
public void createEmptyService(String namespaceId, String serviceName, boolean local) throws NacosException { createServiceIfAbsent(namespaceId, serviceName, local, null); } public void createServiceIfAbsent(String namespaceId, String serviceName, boolean local, Cluster cluster) throws NacosException { Service service = getService(namespaceId, serviceName); //没有才会去创建 if (service == null) { Loggers.SRV_LOG.info("creating empty service {}:{}", namespaceId, serviceName); service = new Service(); service.setName(serviceName); service.setNamespaceId(namespaceId); service.setGroupName(NamingUtils.getGroupName(serviceName)); // now validate the service. if failed, exception will be thrown service.setLastModifiedMillis(System.currentTimeMillis()); service.recalculateChecksum(); if (cluster != null) { cluster.setService(service); service.getClusterMap().put(cluster.getName(), cluster); } service.validate(); //将创建的空的服务插入缓存,并初始化 putServiceAndInit(service); if (!local) { addOrReplaceService(service); } } } //从Map中取出,这个Map的定义 //private final Map<String, Map<String, Service>> serviceMap = new ConcurrentHashMap<>(); //感觉是不是和eureka的双重Map存储很相似 public Service getService(String namespaceId, String serviceName) { if (serviceMap.get(namespaceId) == null) { return null; } return chooseServiceMap(namespaceId).get(serviceName); } private void putServiceAndInit(Service service) throws NacosException { //将服务插入缓存 putService(service); //对服务进行初始化 service.init(); consistencyService.listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), true), service); consistencyService.listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), false), service); Loggers.SRV_LOG.info("[NEW-SERVICE] {}", service.toJson()); } //这里面采用的是双重检查+锁(也就是DCL【Double Check Lock】) public void putService(Service service) { if (!serviceMap.containsKey(service.getNamespaceId())) { synchronized (putServiceLock) { if (!serviceMap.containsKey(service.getNamespaceId())) { serviceMap.put(service.getNamespaceId(), new ConcurrentSkipListMap<>()); } } } serviceMap.get(service.getNamespaceId()).put(service.getName(), service); }
【1.1.1.1】分析服务初始化流程【这里面分为两种,一种是持久实例,一种是临时实例】【服务心跳】
//初始化过程做了什么 public void init() { //健康检查的线程添加一个心跳任务 HealthCheckReactor.scheduleCheck(clientBeatCheckTask); for (Map.Entry<String, Cluster> entry : clusterMap.entrySet()) { entry.getValue().setService(this); entry.getValue().init(); } }
【1.1.1.1.1】持久实例的处理【对于单个cluster】
public void init() { if (inited) { //这样每个集群都只会开启一次 return; } // 创建健康检测的任务 checkTask = new HealthCheckTask(this); // 这里会开启对 非临时实例的 定时健康检测 HealthCheckReactor.scheduleCheck(checkTask); inited = true; } //checkRtNormalized = 2000 + RandomUtils.nextInt(0, RandomUtils.nextInt(0, switchDomain.getTcpHealthParams().getMax())); public static ScheduledFuture<?> scheduleCheck(HealthCheckTask task) { task.setStartTime(System.currentTimeMillis()); //也就是延迟2000 + 5000毫秒内的随机数 return GlobalExecutor.scheduleNamingHealth(task, task.getCheckRtNormalized(), TimeUnit.MILLISECONDS); } //HealthCheckTask类#run方法 @Override public void run() { try {if (distroMapper.responsible(cluster.getService().getName()) && switchDomain.isHealthCheckEnabled(cluster.getService().getName())) { healthCheckProcessor.process(this); } } catch (Throwable e) {...} finally { if (!cancelled) { // 结束后,再次进行任务调度,一定延迟后执行 HealthCheckReactor.scheduleCheck(this); // worst == 0 means never checked if (this.getCheckRtWorst() > 0 && switchDomain.isHealthCheckEnabled(cluster.getService().getName()) && distroMapper.responsible(cluster.getService().getName())) { // TLog doesn't support float so we must convert it into long long diff = ((this.getCheckRtLast() - this.getCheckRtLastLast()) * 10000) / this.getCheckRtLastLast(); this.setCheckRtLastLast(this.getCheckRtLast()); Cluster cluster = this.getCluster(); } } } } //TcpSuperSenseProcessor类#process方法 @Override public void process(HealthCheckTask task) { //拿出集群的持久实例 List<Instance> ips = task.getCluster().allIPs(false); if (CollectionUtils.isEmpty(ips)) { return; } for (Instance ip : ips) { if (ip.isMarked()) { continue; } if (!ip.markChecking()) { healthCheckCommon.reEvaluateCheckRT(task.getCheckRtNormalized() * 2, task, switchDomain.getTcpHealthParams()); continue; } // 封装健康检测信息到 Beat Beat beat = new Beat(ip, task); // 放入一个阻塞队列中 taskQueue.add(beat); MetricsMonitor.getTcpHealthCheckMonitor().incrementAndGet(); } } //又基于他自己本身的构造函数 public TcpSuperSenseProcessor() { try { //开启个线程池 selector = Selector.open(); GlobalExecutor.submitTcpCheck(this); } catch (Exception e) { throw new IllegalStateException(...); } } //TcpSuperSenseProcessor类#run方法 @Override public void run() { while (true) { try { processTask(); int readyCount = selector.selectNow(); if (readyCount <= 0) { continue; } Iterator<SelectionKey> iter = selector.selectedKeys().iterator(); while (iter.hasNext()) { SelectionKey key = iter.next(); iter.remove(); GlobalExecutor.executeTcpSuperSense(new PostProcessor(key)); } } catch (Throwable e) {...} } } //TcpSuperSenseProcessor类#processTask方法 private void processTask() throws Exception { Collection<Callable<Void>> tasks = new LinkedList<>(); do { Beat beat = taskQueue.poll(CONNECT_TIMEOUT_MS / 2, TimeUnit.MILLISECONDS); if (beat == null) { return; } tasks.add(new TaskProcessor(beat)); } while (taskQueue.size() > 0 && tasks.size() < NIO_THREAD_COUNT * 64); // 批量处理集合中的任务 for (Future<?> f : GlobalExecutor.invokeAllTcpSuperSenseTask(tasks)) { f.get(); } } private class TaskProcessor implements Callable<Void> { private static final int MAX_WAIT_TIME_MILLISECONDS = 500; Beat beat; public TaskProcessor(Beat beat) { this.beat = beat; } @Override public Void call() { // 获取检测任务已经等待的时长 long waited = System.currentTimeMillis() - beat.getStartTime(); if (waited > MAX_WAIT_TIME_MILLISECONDS) {...} SocketChannel channel = null; try { // 获取实例信息 Instance instance = beat.getIp(); BeatKey beatKey = keyMap.get(beat.toString()); if (beatKey != null && beatKey.key.isValid()) { if (System.currentTimeMillis() - beatKey.birthTime < TCP_KEEP_ALIVE_MILLIS) { instance.setBeingChecked(false); return null; } beatKey.key.cancel(); beatKey.key.channel().close(); } // 通过NIO建立TCP连接 channel = SocketChannel.open(); channel.configureBlocking(false); // only by setting this can we make the socket close event asynchronous channel.socket().setSoLinger(false, -1); channel.socket().setReuseAddress(true); channel.socket().setKeepAlive(true); channel.socket().setTcpNoDelay(true); Cluster cluster = beat.getTask().getCluster(); int port = cluster.isUseIPPort4Check() ? instance.getPort() : cluster.getDefCkport(); channel.connect(new InetSocketAddress(instance.getIp(), port)); // 注册连接、读取事件 SelectionKey key = channel.register(selector, SelectionKey.OP_CONNECT | SelectionKey.OP_READ); key.attach(beat); keyMap.put(beat.toString(), new BeatKey(key)); beat.setStartTime(System.currentTimeMillis()); //构建一个延迟500毫秒的延迟 GlobalExecutor.scheduleTcpSuperSenseTask(new TimeOutTask(key), CONNECT_TIMEOUT_MS, TimeUnit.MILLISECONDS); } catch (Exception e) { beat.finishCheck(false, false, switchDomain.getTcpHealthParams().getMax(), "tcp:error:" + e.getMessage()); if (channel != null) { try { channel.close(); } catch (Exception ignore) { } } } return null; } }
【1.1.1.1.2】临时实例的处理【对于整个service】
//心跳延迟5s,下一次还是5s public static void scheduleCheck(ClientBeatCheckTask task) { futureMap.putIfAbsent(task.taskKey(), GlobalExecutor.scheduleNamingHealth(task, 5000, 5000, TimeUnit.MILLISECONDS)); } //分析clientBeatCheckTask类,既然是task必然是要研究一下run方法的 //对Service开启心跳检测【但是你会发现只是对临时实例】 @Override public void run() { try {
//hash取模,也就是只允许它在一台机器上进行检查 if (!getDistroMapper().responsible(service.getName())) { return; } if (!getSwitchDomain().isHealthCheckEnabled()) { return; } //拿到该服务下面的所有IP【单指临时实例】 List<Instance> instances = service.allIPs(true); // 当前时间距离上次心跳时间超过15s,则将实例健康改为false for (Instance instance : instances) { //如果没有设置,默认值DEFAULT_HEART_BEAT_TIMEOUT = TimeUnit.SECONDS.toMillis(15); if (System.currentTimeMillis() - instance.getLastBeat() > instance.getInstanceHeartBeatTimeOut()) { if (!instance.isMarked()) { if (instance.isHealthy()) { instance.setHealthy(false); getPushService().serviceChanged(service); ApplicationUtils.publishEvent(new InstanceHeartbeatTimeoutEvent(this, instance)); } } } } if (!getGlobalConfig().isExpireInstance()) { return; } //当前时间距离上次心跳时间超过30s,则将实例从内存中删除 for (Instance instance : instances) { if (instance.isMarked()) { continue; } //如果没有设置,默认值DEFAULT_IP_DELETE_TIMEOUT = TimeUnit.SECONDS.toMillis(30); if (System.currentTimeMillis() - instance.getLastBeat() > instance.getIpDeleteTimeout()) { // delete instance,自己向自己发起删除请求 deleteIp(instance); } } } catch (Exception e) {...} }
【1.1.1.1.3】汇总一波
//Nacos的健康检测有两种模式: //临时实例: 采用客户端心跳检测模式,心跳周期5秒 心跳间隔超过15秒则标记为不健康 心跳间隔超过30秒则从服务列表删除 //永久实例: 采用服务端主动健康检测方式 周期为2000 + 5000毫秒内的随机数 检测异常只会标记为不健康,不会删除
【1.1.2】分析addInstance注册方法
public void addInstance(String namespaceId, String serviceName, boolean ephemeral, Instance... ips) throws NacosException { String key = KeyBuilder.buildInstanceListKey(namespaceId, serviceName, ephemeral); Service service = getService(namespaceId, serviceName); synchronized (service) { //将注册表中已经存在的实例与当前注册过来的实例给合并为list List<Instance> instanceList = addIpAddresses(service, ephemeral, ips); Instances instances = new Instances(); instances.setInstanceList(instanceList); //将上面合并后的实例列表更新到注册表中 //实现类为DelegateConsistencyServiceImpl consistencyService.put(key, instances); } } //构建KEY,ephemeral默认是true,也就是一般认为都是临时实例 public static String buildInstanceListKey(String namespaceId, String serviceName, boolean ephemeral) { return ephemeral ? buildEphemeralInstanceListKey(namespaceId, serviceName) : buildPersistentInstanceListKey(namespaceId, serviceName); } //临时实例 private static String buildEphemeralInstanceListKey(String namespaceId, String serviceName) { //com.alibaba.nacos.naming.iplist.ephemeral.{namespaceId}##{serviceName} return INSTANCE_LIST_KEY_PREFIX + EPHEMERAL_KEY_PREFIX + namespaceId + NAMESPACE_KEY_CONNECTOR + serviceName; } //持久实例 private static String buildPersistentInstanceListKey(String namespaceId, String serviceName) { //com.alibaba.nacos.naming.iplist.{namespaceId}##{serviceName} return INSTANCE_LIST_KEY_PREFIX + namespaceId + NAMESPACE_KEY_CONNECTOR + serviceName; } private List<Instance> addIpAddresses(Service service, boolean ephemeral, Instance... ips) throws NacosException { return updateIpAddresses(service, UtilsAndCommons.UPDATE_INSTANCE_ACTION_ADD, ephemeral, ips); } public List<Instance> updateIpAddresses(Service service, String action, boolean ephemeral, Instance... ips) throws NacosException { Datum datum = consistencyService.get(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), ephemeral)); List<Instance> currentIPs = service.allIPs(ephemeral); Map<String, Instance> currentInstances = new HashMap<>(currentIPs.size()); Set<String> currentInstanceIds = Sets.newHashSet(); for (Instance instance : currentIPs) { currentInstances.put(instance.toIpAddr(), instance); currentInstanceIds.add(instance.getInstanceId()); } Map<String, Instance> instanceMap; if (datum != null && null != datum.value) { instanceMap = setValid(((Instances) datum.value).getInstanceList(), currentInstances); } else { instanceMap = new HashMap<>(ips.length); } for (Instance instance : ips) { if (!service.getClusterMap().containsKey(instance.getClusterName())) { Cluster cluster = new Cluster(instance.getClusterName(), service); cluster.init(); service.getClusterMap().put(instance.getClusterName(), cluster); } //我们这次进来的action是add,并不是remove,所以不走这里 if (UtilsAndCommons.UPDATE_INSTANCE_ACTION_REMOVE.equals(action)) { instanceMap.remove(instance.getDatumKey()); } else { //最后就是实例放到map中,以ip+port等信息为key,value为当前实例 Instance oldInstance = instanceMap.get(instance.getDatumKey()); if (oldInstance != null) { //覆盖 instance.setInstanceId(oldInstance.getInstanceId()); } else { //新增 instance.setInstanceId(instance.generateInstanceId(currentInstanceIds)); } instanceMap.put(instance.getDatumKey(), instance); } } if (instanceMap.size() <= 0 && UtilsAndCommons.UPDATE_INSTANCE_ACTION_ADD.equals(action)) { throw new IllegalArgumentException(...); } return new ArrayList<>(instanceMap.values()); } //DelegateConsistencyServiceImpl类#put方法 @Override public void put(String key, Record value) throws NacosException { mapConsistencyService(key).put(key, value); } //private final PersistentConsistencyServiceDelegateImpl persistentConsistencyService; //private final EphemeralConsistencyService ephemeralConsistencyService; //实际上的实现是DistroConsistencyServiceImpl //根据key值决定临时节点的方式【AP架构】还会持久节点的方式【AP架构】 private ConsistencyService mapConsistencyService(String key) { return KeyBuilder.matchEphemeralKey(key) ? ephemeralConsistencyService : persistentConsistencyService; }
【1.1.2.1】分析临时节点的方式【AP架构模式】
//DistroConsistencyServiceImpl类#put方法 @Override public void put(String key, Record value) throws NacosException { //添加任务以及添加到内存中 onPut(key, value); //同步数据到所有其他节点 distroProtocol.sync(new DistroKey(key, KeyBuilder.INSTANCE_LIST_KEY_PREFIX), DataOperation.CHANGE, globalConfig.getTaskDispatchPeriod() / 2); } public void onPut(String key, Record value) { if (KeyBuilder.matchEphemeralInstanceListKey(key)) { Datum<Instances> datum = new Datum<>(); //当前服务的所有实例 datum.value = (Instances) value; datum.key = key; datum.timestamp.incrementAndGet(); //添加到dataStore里面,实际上是放入里面的dataMap里面 dataStore.put(key, datum); } if (!listeners.containsKey(key)) { return; } //添加异步任务 notifier.addTask(key, DataOperation.CHANGE); } //分析notifier类,既然继承了Runnable接口必然有run方法
//GlobalExecutor.submitDistroNotifyTask(notifier); 在DistroConsistencyServiceImpl类中初始化,默认单线程 public class Notifier implements Runnable { private ConcurrentHashMap<String, String> services = new ConcurrentHashMap<>(10 * 1024); private BlockingQueue<Pair<String, DataOperation>> tasks = new ArrayBlockingQueue<>(1024 * 1024); public void addTask(String datumKey, DataOperation action) { if (services.containsKey(datumKey) && action == DataOperation.CHANGE) { return; } if (action == DataOperation.CHANGE) { services.put(datumKey, StringUtils.EMPTY); } //存入阻塞队列里面 tasks.offer(Pair.with(datumKey, action)); } public int getTaskSize() { return tasks.size(); } @Override public void run() { //死循环不断处理队列的数据 for (; ; ) { try { //从队列里面拿出数据进行处理 Pair<String, DataOperation> pair = tasks.take(); handle(pair); } catch (Throwable e) {..} } } private void handle(Pair<String, DataOperation> pair) { try { //将数据还原 String datumKey = pair.getValue0(); DataOperation action = pair.getValue1(); services.remove(datumKey); int count = 0; if (!listeners.containsKey(datumKey)) { return; } for (RecordListener listener : listeners.get(datumKey)) { count++; try { if (action == DataOperation.CHANGE) { //会调用Service类#onChange方法 listener.onChange(datumKey, dataStore.get(datumKey).value); continue; } if (action == DataOperation.DELETE) { listener.onDelete(datumKey); continue; } } catch (Throwable e) {...} } } catch (Throwable e) {...} } }
【1.1.2.1.1】分析Service类#onChange方法
//Service类#onChange方法 @Override public void onChange(String key, Instances value) throws Exception { //对权重的处理,不太重要 for (Instance instance : value.getInstanceList()) { if (instance == null) { // Reject this abnormal instance list: throw new RuntimeException("got null instance " + key); } if (instance.getWeight() > 10000.0D) { instance.setWeight(10000.0D); } if (instance.getWeight() < 0.01D && instance.getWeight() > 0.0D) { instance.setWeight(0.01D); } } //注册逻辑在这里 updateIPs(value.getInstanceList(), KeyBuilder.matchEphemeralInstanceListKey(key)); //生成校验比对的哈希值码 recalculateChecksum(); }
【1.1.2.1.1.1】分析注册逻辑
//更新注册表,这里采用了写时复制思想:即将注册表拷贝一个副本出来,更新这个副本,但是服务发现的时候还是从注册表里获取,待全部更新完毕再将副本替换回注册表中,这样就避免了注册表的读写并发问题,这种方式不用加锁,从而大大提升了性能 //真正的注册逻辑 public void updateIPs(Collection<Instance> instances, boolean ephemeral) { Map<String, List<Instance>> ipMap = new HashMap<>(clusterMap.size()); for (String clusterName : clusterMap.keySet()) { ipMap.put(clusterName, new ArrayList<>()); } for (Instance instance : instances) { try { if (instance == null) { continue; } if (StringUtils.isEmpty(instance.getClusterName())) { instance.setClusterName(UtilsAndCommons.DEFAULT_CLUSTER_NAME); } if (!clusterMap.containsKey(instance.getClusterName())) { Cluster cluster = new Cluster(instance.getClusterName(), this); cluster.init(); getClusterMap().put(instance.getClusterName(), cluster); } List<Instance> clusterIPs = ipMap.get(instance.getClusterName()); if (clusterIPs == null) { clusterIPs = new LinkedList<>(); ipMap.put(instance.getClusterName(), clusterIPs); } clusterIPs.add(instance); } catch (Exception e) {...} } for (Map.Entry<String, List<Instance>> entry : ipMap.entrySet()) { //make every ip mine List<Instance> entryIPs = entry.getValue(); //在这里进行注册 clusterMap.get(entry.getKey()).updateIps(entryIPs, ephemeral); } setLastModifiedMillis(System.currentTimeMillis());
//发送监听事件 getPushService().serviceChanged(this); StringBuilder stringBuilder = new StringBuilder(); for (Instance instance : allIPs()) { stringBuilder.append(instance.toIpAddr()).append("_").append(instance.isHealthy()).append(","); } } public void updateIps(List<Instance> ips, boolean ephemeral) { Set<Instance> toUpdateInstances = ephemeral ? ephemeralInstances : persistentInstances; HashMap<String, Instance> oldIpMap = new HashMap<>(toUpdateInstances.size()); for (Instance ip : toUpdateInstances) { oldIpMap.put(ip.getDatumKey(), ip); } //进行新旧合并 List<Instance> updatedIPs = updatedIps(ips, oldIpMap.values()); if (updatedIPs.size() > 0) { for (Instance ip : updatedIPs) { Instance oldIP = oldIpMap.get(ip.getDatumKey()); // do not update the ip validation status of updated ips // because the checker has the most precise result // Only when ip is not marked, don't we update the health status of IP: if (!ip.isMarked()) { ip.setHealthy(oldIP.isHealthy()); } if (ip.isHealthy() != oldIP.isHealthy()) {...} if (ip.getWeight() != oldIP.getWeight()) {...} } } List<Instance> newIPs = subtract(ips, oldIpMap.values()); if (newIPs.size() > 0) { for (Instance ip : newIPs) { HealthCheckStatus.reset(ip); } } List<Instance> deadIPs = subtract(oldIpMap.values(), ips); if (deadIPs.size() > 0) { for (Instance ip : deadIPs) { HealthCheckStatus.remv(ip); } } toUpdateInstances = new HashSet<>(ips); //写入注册表的cluster里面的列表,采用引用替换 if (ephemeral) { ephemeralInstances = toUpdateInstances; } else { persistentInstances = toUpdateInstances; } } //新旧合并形成新的列表 private List<Instance> updatedIps(Collection<Instance> newInstance, Collection<Instance> oldInstance) { List<Instance> intersects = (List<Instance>) CollectionUtils.intersection(newInstance, oldInstance); Map<String, Instance> stringIpAddressMap = new ConcurrentHashMap<>(intersects.size()); for (Instance instance : intersects) { stringIpAddressMap.put(instance.getIp() + ":" + instance.getPort(), instance); } Map<String, Integer> intersectMap = new ConcurrentHashMap<>(newInstance.size() + oldInstance.size()); Map<String, Instance> updatedInstancesMap = new ConcurrentHashMap<>(newInstance.size()); Map<String, Instance> newInstancesMap = new ConcurrentHashMap<>(newInstance.size()); for (Instance instance : oldInstance) { if (stringIpAddressMap.containsKey(instance.getIp() + ":" + instance.getPort())) { intersectMap.put(instance.toString(), 1); } } for (Instance instance : newInstance) { if (stringIpAddressMap.containsKey(instance.getIp() + ":" + instance.getPort())) { if (intersectMap.containsKey(instance.toString())) { intersectMap.put(instance.toString(), 2); } else { intersectMap.put(instance.toString(), 1); } } newInstancesMap.put(instance.toString(), instance); } for (Map.Entry<String, Integer> entry : intersectMap.entrySet()) { String key = entry.getKey(); Integer value = entry.getValue(); if (value == 1) { if (newInstancesMap.containsKey(key)) { updatedInstancesMap.put(key, newInstancesMap.get(key)); } } } return new ArrayList<>(updatedInstancesMap.values()); }
【1.1.2.1.1.1.1】分析getPushService().serviceChanged(this);发送的事件是如何推送给客户端的
//发送的事件 public void serviceChanged(Service service) { // merge some change events to reduce the push frequency: if (futureMap.containsKey(UtilsAndCommons.assembleFullServiceName(service.getNamespaceId(), service.getName()))) { return; } this.applicationContext.publishEvent(new ServiceChangeEvent(this, service)); } //使用IDEA的全文搜索,ctrl+shift+F,找到对应onApplicationEvent(ServiceChangeEvent event) //PushService类#onApplicationEvent方法 @Override public void onApplicationEvent(ServiceChangeEvent event) { Service service = event.getService(); String serviceName = service.getName(); String namespaceId = service.getNamespaceId(); Future future = GlobalExecutor.scheduleUdpSender(() -> { try { Loggers.PUSH.info(serviceName + " is changed, add it to push queue."); ConcurrentMap<String, PushClient> clients = clientMap.get(UtilsAndCommons.assembleFullServiceName(namespaceId, serviceName)); if (MapUtils.isEmpty(clients)) { return; } Map<String, Object> cache = new HashMap<>(16); long lastRefTime = System.nanoTime(); for (PushClient client : clients.values()) { if (client.zombie()) { Loggers.PUSH.debug("client is zombie: " + client.toString()); clients.remove(client.toString()); Loggers.PUSH.debug("client is zombie: " + client.toString()); continue; } Receiver.AckEntry ackEntry; Loggers.PUSH.debug("push serviceName: {} to client: {}", serviceName, client.toString()); String key = getPushCacheKey(serviceName, client.getIp(), client.getAgent()); byte[] compressData = null; Map<String, Object> data = null; if (switchDomain.getDefaultPushCacheMillis() >= 20000 && cache.containsKey(key)) { org.javatuples.Pair pair = (org.javatuples.Pair) cache.get(key); compressData = (byte[]) (pair.getValue0()); data = (Map<String, Object>) pair.getValue1(); Loggers.PUSH.debug("[PUSH-CACHE] cache hit: {}:{}", serviceName, client.getAddrStr()); } if (compressData != null) { ackEntry = prepareAckEntry(client, compressData, data, lastRefTime); } else { ackEntry = prepareAckEntry(client, prepareHostsData(client), lastRefTime); if (ackEntry != null) { cache.put(key, new org.javatuples.Pair<>(ackEntry.origin.getData(), ackEntry.data)); } } //发起请求 udpPush(ackEntry); } } catch (Exception e) {...} finally { futureMap.remove(UtilsAndCommons.assembleFullServiceName(namespaceId, serviceName)); } }, 1000, TimeUnit.MILLISECONDS); futureMap.put(UtilsAndCommons.assembleFullServiceName(namespaceId, serviceName), future); } private static Receiver.AckEntry udpPush(Receiver.AckEntry ackEntry) { if (ackEntry == null) { Loggers.PUSH.error("[NACOS-PUSH] ackEntry is null."); return null; } if (ackEntry.getRetryTimes() > MAX_RETRY_TIMES) { Loggers.PUSH.warn("max re-push times reached, retry times {}, key: {}", ackEntry.retryTimes, ackEntry.key); ackMap.remove(ackEntry.key); udpSendTimeMap.remove(ackEntry.key); failedPush += 1; return ackEntry; } try { if (!ackMap.containsKey(ackEntry.key)) { totalPush++; } ackMap.put(ackEntry.key, ackEntry); udpSendTimeMap.put(ackEntry.key, System.currentTimeMillis()); Loggers.PUSH.info("send udp packet: " + ackEntry.key); //UDP发送 udpSocket.send(ackEntry.origin); ackEntry.increaseRetryTime(); GlobalExecutor.scheduleRetransmitter(new Retransmitter(ackEntry),TimeUnit.NANOSECONDS.toMillis(ACK_TIMEOUT_NANOS), TimeUnit.MILLISECONDS); return ackEntry; } catch (Exception e) { ackMap.remove(ackEntry.key); udpSendTimeMap.remove(ackEntry.key); failedPush += 1; return null; } }
【1.1.2.1.1.2】分析recalculateChecksum方法如何生成比对的哈希值码
//有了解过eureka的增量更新便应该知道,如何知道你自己拉取的数据全不全,就是靠比对这个哈希值码 //如果服务端上的是【ABCDEF】,而你本地的是【ABCDF】,那么哈希值码不一样说明数据缺失了。 public synchronized void recalculateChecksum() { List<Instance> ips = allIPs(); StringBuilder ipsString = new StringBuilder(); ipsString.append(getServiceString()); if (Loggers.SRV_LOG.isDebugEnabled()) { Loggers.SRV_LOG.debug("service to json: " + getServiceString()); } if (CollectionUtils.isNotEmpty(ips)) { Collections.sort(ips); } for (Instance ip : ips) { String string = ip.getIp() + ":" + ip.getPort() + "_" + ip.getWeight() + "_" + ip.isHealthy() + "_" + ip.getClusterName(); ipsString.append(string); ipsString.append(","); } checksum = MD5Utils.md5Hex(ipsString.toString(), Constants.ENCODE); }
【1.1.2.1.2】分析集群服务新增数据同步的方法 distroProtocol.sync:
//DistroProtocol类#sync方法 public void sync(DistroKey distroKey, DataOperation action, long delay) { for (Member each : memberManager.allMembersWithoutSelf()) { DistroKey distroKeyWithTarget = new DistroKey(distroKey.getResourceKey(), distroKey.getResourceType(), each.getAddress()); DistroDelayTask distroDelayTask = new DistroDelayTask(distroKeyWithTarget, action, delay); //添加任务,采用异步的方式,会有重试功能 distroTaskEngineHolder.getDelayTaskExecuteEngine().addTask(distroKeyWithTarget, distroDelayTask); } }
【1.1.2.2】分析持久节点的方式(这块还有部分没搞懂,后面补上)
【1.2】分析服务发现的调用
@GetMapping("/list") @Secured(parser = NamingResourceParser.class, action = ActionTypes.READ) public ObjectNode list(HttpServletRequest request) throws Exception { String namespaceId = WebUtils.optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID); String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME); NamingUtils.checkServiceNameFormat(serviceName); String agent = WebUtils.getUserAgent(request); String clusters = WebUtils.optional(request, "clusters", StringUtils.EMPTY); String clientIP = WebUtils.optional(request, "clientIP", StringUtils.EMPTY); int udpPort = Integer.parseInt(WebUtils.optional(request, "udpPort", "0")); String env = WebUtils.optional(request, "env", StringUtils.EMPTY); boolean isCheck = Boolean.parseBoolean(WebUtils.optional(request, "isCheck", "false")); String app = WebUtils.optional(request, "app", StringUtils.EMPTY); String tenant = WebUtils.optional(request, "tid", StringUtils.EMPTY); boolean healthyOnly = Boolean.parseBoolean(WebUtils.optional(request, "healthyOnly", "false")); //上面进行参数校验,这里开始 return doSrvIpxt(namespaceId, serviceName, agent, clusters, clientIP, udpPort, env, isCheck, app, tenant,healthyOnly); } //查看是如何获取服务进行返回的 public ObjectNode doSrvIpxt(String namespaceId, String serviceName, String agent, String clusters, String clientIP, int udpPort, String env, boolean isCheck, String app, String tid, boolean healthyOnly) throws Exception { ClientInfo clientInfo = new ClientInfo(agent); ObjectNode result = JacksonUtils.createEmptyJsonNode(); Service service = serviceManager.getService(namespaceId, serviceName); long cacheMillis = switchDomain.getDefaultCacheMillis(); // now try to enable the push try { //判断是否支持UDP方式推送,不重要 if (udpPort > 0 && pushService.canEnablePush(agent)) { pushService.addClient(namespaceId, serviceName, clusters, agent, new InetSocketAddress(clientIP, udpPort), pushDataSource, tid, app); cacheMillis = switchDomain.getPushCacheMillis(serviceName); } } catch (Exception e) { cacheMillis = switchDomain.getDefaultCacheMillis(); } if (service == null) { result.put("name", serviceName); result.put("clusters", clusters); result.put("cacheMillis", cacheMillis); result.replace("hosts", JacksonUtils.createEmptyArrayNode()); return result; } checkIfDisabled(service); List<Instance> srvedIPs; //主要是在这里获取 srvedIPs = service.srvIPs(Arrays.asList(StringUtils.split(clusters, ","))); //然后下面主要就是塞数据然后返回 // filter ips using selector: if (service.getSelector() != null && StringUtils.isNotBlank(clientIP)) { srvedIPs = service.getSelector().select(clientIP, srvedIPs); } if (CollectionUtils.isEmpty(srvedIPs)) { if (clientInfo.type == ClientInfo.ClientType.JAVA && clientInfo.version.compareTo(VersionUtil.parseVersion("1.0.0")) >= 0) { result.put("dom", serviceName); } else { result.put("dom", NamingUtils.getServiceName(serviceName)); } result.put("name", serviceName); result.put("cacheMillis", cacheMillis); result.put("lastRefTime", System.currentTimeMillis()); result.put("checksum", service.getChecksum()); result.put("useSpecifiedURL", false); result.put("clusters", clusters); result.put("env", env); result.set("hosts", JacksonUtils.createEmptyArrayNode()); result.set("metadata", JacksonUtils.transferToJsonNode(service.getMetadata())); return result; } Map<Boolean, List<Instance>> ipMap = new HashMap<>(2); ipMap.put(Boolean.TRUE, new ArrayList<>()); ipMap.put(Boolean.FALSE, new ArrayList<>()); for (Instance ip : srvedIPs) { ipMap.get(ip.isHealthy()).add(ip); } if (isCheck) { result.put("reachProtectThreshold", false); } double threshold = service.getProtectThreshold(); if ((float) ipMap.get(Boolean.TRUE).size() / srvedIPs.size() <= threshold) { if (isCheck) { result.put("reachProtectThreshold", true); } ipMap.get(Boolean.TRUE).addAll(ipMap.get(Boolean.FALSE)); ipMap.get(Boolean.FALSE).clear(); } if (isCheck) { result.put("protectThreshold", service.getProtectThreshold()); result.put("reachLocalSiteCallThreshold", false); return JacksonUtils.createEmptyJsonNode(); } ArrayNode hosts = JacksonUtils.createEmptyArrayNode(); for (Map.Entry<Boolean, List<Instance>> entry : ipMap.entrySet()) { List<Instance> ips = entry.getValue(); if (healthyOnly && !entry.getKey()) { continue; } for (Instance instance : ips) { // remove disabled instance: if (!instance.isEnabled()) { continue; } ObjectNode ipObj = JacksonUtils.createEmptyJsonNode(); ipObj.put("ip", instance.getIp()); ipObj.put("port", instance.getPort()); // deprecated since nacos 1.0.0: ipObj.put("valid", entry.getKey()); ipObj.put("healthy", entry.getKey()); ipObj.put("marked", instance.isMarked()); ipObj.put("instanceId", instance.getInstanceId()); ipObj.set("metadata", JacksonUtils.transferToJsonNode(instance.getMetadata())); ipObj.put("enabled", instance.isEnabled()); ipObj.put("weight", instance.getWeight()); ipObj.put("clusterName", instance.getClusterName()); if (clientInfo.type == ClientInfo.ClientType.JAVA && clientInfo.version.compareTo(VersionUtil.parseVersion("1.0.0")) >= 0) { ipObj.put("serviceName", instance.getServiceName()); } else { ipObj.put("serviceName", NamingUtils.getServiceName(instance.getServiceName())); } ipObj.put("ephemeral", instance.isEphemeral()); hosts.add(ipObj); } } result.replace("hosts", hosts); if (clientInfo.type == ClientInfo.ClientType.JAVA && clientInfo.version.compareTo(VersionUtil.parseVersion("1.0.0")) >= 0) { result.put("dom", serviceName); } else { result.put("dom", NamingUtils.getServiceName(serviceName)); } result.put("name", serviceName); result.put("cacheMillis", cacheMillis); result.put("lastRefTime", System.currentTimeMillis()); result.put("checksum", service.getChecksum()); result.put("useSpecifiedURL", false); result.put("clusters", clusters); result.put("env", env); result.replace("metadata", JacksonUtils.transferToJsonNode(service.getMetadata())); return result; }
【1.2.1】分析service.srvIPs方法在服务发现的如何获取所有实例
public List<Instance> srvIPs(List<String> clusters) { if (CollectionUtils.isEmpty(clusters)) { clusters = new ArrayList<>(); clusters.addAll(clusterMap.keySet()); } return allIPs(clusters); } public List<Instance> allIPs(List<String> clusters) { List<Instance> result = new ArrayList<>(); for (String cluster : clusters) { Cluster clusterObj = clusterMap.get(cluster); if (clusterObj == null) { continue; } //将临时实例和持久实例一起返回 result.addAll(clusterObj.allIPs()); } return result; } public List<Instance> allIPs() { List<Instance> allInstances = new ArrayList<>(); allInstances.addAll(persistentInstances); allInstances.addAll(ephemeralInstances); return allInstances; }
【2】集群情况下
【2.1】集群节点状态同步任务
@Component("serverListManager") //自动被扫描成为Bean public class ServerListManager extends MemberChangeListener { public ServerListManager(final SwitchDomain switchDomain, final ServerMemberManager memberManager) { this.switchDomain = switchDomain; this.memberManager = memberManager; NotifyCenter.registerSubscriber(this); this.servers = new ArrayList<>(memberManager.allMembers()); } //初始化时候自动调用 @PostConstruct public void init() { //注册两个任务的方法 //集群节点状态同步任务 GlobalExecutor.registerServerStatusReporter(new ServerStatusReporter(), 2000); //本地服务更新任务 GlobalExecutor.registerServerInfoUpdater(new ServerInfoUpdater()); } } private class ServerStatusReporter implements Runnable { @Override public void run() { try { if (EnvUtil.getPort() <= 0) { return; } int weight = Runtime.getRuntime().availableProcessors() / 2; if (weight <= 0) { weight = 1; } long curTime = System.currentTimeMillis(); String status = LOCALHOST_SITE + "#" + EnvUtil.getLocalAddress() + "#" + curTime + "#" + weight + "\r\n"; //获取所有节点 List<Member> allServers = getServers(); if (!contains(EnvUtil.getLocalAddress())) { Loggers.SRV_LOG.error("local ip is not in serverlist, ip: {}, serverlist: {}", EnvUtil.getLocalAddress(), allServers); return; } //遍历 if (allServers.size() > 0 && !EnvUtil.getLocalAddress().contains(IPUtil.localHostIP())) { for (Member server : allServers) { //排除自己 if (Objects.equals(server.getAddress(), EnvUtil.getLocalAddress())) { continue; } // This metadata information exists from 1.3.0 onwards "version" if (server.getExtendVal(MemberMetaDataConstants.VERSION) != null) { continue; } Message msg = new Message(); msg.setData(status); //向接口/nacos/v1/ns/operator/server/status发送数据 synchronizer.send(server.getAddress(), msg); } } } catch (Exception e) {...} finally { //TimeUnit.SECONDS.toMillis(2),也就是每2s一次 GlobalExecutor.registerServerStatusReporter(this, switchDomain.getServerStatusSynchronizationPeriodMillis()); } } }
【2.2】注册服务实例信息在集群节点间同步任务
//ServiceManager类#初始化方法 @PostConstruct public void init() { //每分钟同步一次 GlobalExecutor.scheduleServiceReporter(new ServiceReporter(), 60000, TimeUnit.MILLISECONDS); GlobalExecutor.submitServiceUpdateManager(new UpdatedServiceProcessor()); if (emptyServiceAutoClean) { // delay 60s, period 20s; // This task is not recommended to be performed frequently in order to avoid // the possibility that the service cache information may just be deleted // and then created due to the heartbeat mechanism GlobalExecutor.scheduleServiceAutoClean(new EmptyServiceAutoClean(), cleanEmptyServiceDelay, cleanEmptyServicePeriod); } try { consistencyService.listen(KeyBuilder.SERVICE_META_KEY_PREFIX, this); } catch (NacosException e) {...} } //同步服务的健康状态 private class ServiceReporter implements Runnable { @Override public void run() { try { Map<String, Set<String>> allServiceNames = getAllServiceNames(); if (allServiceNames.size() <= 0) { //ignore return; } for (String namespaceId : allServiceNames.keySet()) { ServiceChecksum checksum = new ServiceChecksum(namespaceId); for (String serviceName : allServiceNames.get(namespaceId)) { if (!distroMapper.responsible(serviceName)) { continue; } Service service = getService(namespaceId, serviceName); if (service == null || service.isEmpty()) { continue; } service.recalculateChecksum(); checksum.addItem(serviceName, service.getChecksum()); } Message msg = new Message(); msg.setData(JacksonUtils.toJson(checksum)); Collection<Member> sameSiteServers = memberManager.allMembers(); if (sameSiteServers == null || sameSiteServers.size() <= 0) { return; } for (Member server : sameSiteServers) { if (server.getAddress().equals(NetUtils.localServer())) { continue; } synchronizer.send(server.getAddress(), msg); } } } catch (Exception e) {...} finally { GlobalExecutor.scheduleServiceReporter(this, switchDomain.getServiceStatusSynchronizationPeriodMillis(), TimeUnit.MILLISECONDS); } } }
Nacos服务注册表结构:Map<namespace, map<group::servicename,="" service="">>
结构展示
示例展示
Nacos核心功能源码架构