Nacos原理02-注册中心
注册中心原理
- 服务实例在启动时注册到服务注册表,并在关闭时注销
- 服务消费者查询服务注册表,获得可用实例
- 服务注册中心需要调用服务实例的健康检查API来验证它是否能够处理请求
服务注册与心跳机制
注册的时机
在Spring-Cloud-Common包中有一个类org.springframework.cloud.client.serviceregistry.ServiceRegistry
,它是Spring Cloud提供的服务注册的标准。集成到Spring Cloud中实现服务注册的组件,都会实现该接口。
该接口有一个实现类: NacosServiceRegistry
再来看SpringCloud集成Nacos的实现过程,首先来看spring-cloud-common
包下 META_INF
目录下的自动装配的配置文件spring.factories
:
其中 ResourceServerTokenRelayAutoConfiguration 就是服务注册相关的配置类。
看这个类的代码
@Configuration(proxyBeanMethods = false) @Import(AutoServiceRegistrationConfiguration.class) @ConditionalOnProperty(value = "spring.cloud.service-registry.auto-registration.enabled", matchIfMissing = true) public class AutoServiceRegistrationAutoConfiguration { @Autowired(required = false) private AutoServiceRegistration autoServiceRegistration; @Autowired private AutoServiceRegistrationProperties properties; @PostConstruct protected void init() { if (this.autoServiceRegistration == null && this.properties.isFailFast()) { throw new IllegalStateException( "Auto Service Registration has " + "been requested, but there is no AutoServiceRegistration bean"); } } }
根据这里的成员变量 autoServiceRegistration ,找到了它的实现类之一: NacosAutoServiceRegistration
先看继承关系
观察类图,看到了熟悉的EventListener
接口,就能想到,Nacos是通过Spring的事件机制进行注册的。
在AbstractAutoServiceRegistration
类中实现了一个onApplicationEvent(WebServerInitializedEvent)
方法,监听WebServerInitializedEvent事件,这个WebServerInitializedEvent事件在WebServer就绪后被推送。
public abstract class AbstractAutoServiceRegistration<R extends Registration> implements AutoServiceRegistration, ApplicationContextAware, ApplicationListener<WebServerInitializedEvent> { private AtomicInteger port = new AtomicInteger(0); @Override @SuppressWarnings("deprecation") public void onApplicationEvent(WebServerInitializedEvent event) { bind(event); } @Deprecated public void bind(WebServerInitializedEvent event) { ApplicationContext context = event.getApplicationContext(); if (context instanceof ConfigurableWebServerApplicationContext) { if ("management".equals(((ConfigurableWebServerApplicationContext) context).getServerNamespace())) { return; } } // 得到服务的端口 this.port.compareAndSet(0, event.getWebServer().getPort()); this.start(); } }
注册的流程
具体看看start()
方法中注册的流程
public void start() { if (!isEnabled()) { if (logger.isDebugEnabled()) { logger.debug("Discovery Lifecycle disabled. Not starting"); } return; } // 仅当 nonSecurePort 大于 0 且尚未运行时才初始化 if (!this.running.get()) { // 服务注册前发布InstancePreRegisteredEvent事件,对nacos的一些元数据属性做缓存 this.context.publishEvent(new InstancePreRegisteredEvent(this, getRegistration())); // 服务注册核心方法 register(); if (shouldRegisterManagement()) { // 注册本地管理服务 registerManagement(); } // 服务注册后发布InstanceRegisteredEvent事件,输出"Discovery Client has been initialized"的debug日志 this.context.publishEvent(new InstanceRegisteredEvent<>(this, getConfiguration())); this.running.compareAndSet(false, true); } } /** * 注册本地服务 */ protected void register() { this.serviceRegistry.register(getRegistration()); }
接下来看serviceRegistry.register
方法
public void register(Registration registration) { if (StringUtils.isEmpty(registration.getServiceId())) { log.warn("No service to register for nacos client..."); return; } // 命名空间 NamingService namingService = namingService(); // 服务id String serviceId = registration.getServiceId(); // 服务发现的group String group = nacosDiscoveryProperties.getGroup(); // new 一个Nacos实例对象 Instance instance = getNacosInstanceFromRegistration(registration); try { namingService.registerInstance(serviceId, group, instance); log.info("nacos registry, {} {} {}:{} register finished", group, serviceId, instance.getIp(), instance.getPort()); } catch (Exception e) { if (nacosDiscoveryProperties.isFailFast()) { log.error("nacos registry, {} register failed...{},", serviceId, registration.toString(), e); rethrowRuntimeException(e); } else { log.warn("Failfast is false. {} register failed...{},", serviceId, registration.toString(), e); } } }
这里主要是对属性做一个封装,可以看到调用了namingService.registerInstance
方法做具体的注册操作。
public void registerInstance(String serviceName, String groupName, Instance instance) throws NacosException { NamingUtils.checkInstanceIsLegal(instance); String groupedServiceName = NamingUtils.getGroupedName(serviceName, groupName); if (instance.isEphemeral()) { // 拼装一个心跳信息对象,put到beatReactor的dom2Beat中 BeatInfo beatInfo = beatReactor.buildBeatInfo(groupedServiceName, instance); beatReactor.addBeatInfo(groupedServiceName, beatInfo); } // 注册服务 serverProxy.registerService(groupedServiceName, groupName, instance); }
这里有两个关键操作,一个是注册心跳信息,一个是注册服务的信息,先来看一看心跳是如何注册,并且实现定时的心跳发送的。
发送心跳
/** * 添加心跳信息 */ public void addBeatInfo(String serviceName, BeatInfo beatInfo) { NAMING_LOGGER.info("[BEAT] adding beat: {} to beat map.", beatInfo); String key = buildKey(serviceName, beatInfo.getIp(), beatInfo.getPort()); BeatInfo existBeat = null; //fix #1733 if ((existBeat = dom2Beat.remove(key)) != null) { existBeat.setStopped(true); } // put操作将beatInfo加入map中 dom2Beat.put(key, beatInfo); // 启动心跳检测的定时任务 executorService.schedule(new BeatTask(beatInfo), beatInfo.getPeriod(), TimeUnit.MILLISECONDS); MetricsMonitor.getDom2BeatSizeMonitor().set(dom2Beat.size()); }
这里看到,添加心跳信息有两个操作,一个是将beatInfo加入到map中,一个是启动发送心跳的延时任务,其中任务的执行延迟时间由beatInfo的配置决定,单位为毫秒。
接下来定位到任务的具体代码。
class BeatTask implements Runnable { BeatInfo beatInfo; public BeatTask(BeatInfo beatInfo) { this.beatInfo = beatInfo; } @Override public void run() { if (beatInfo.isStopped()) { // 心跳被停止了,就直接返回 return; } long nextTime = beatInfo.getPeriod(); try { // 发送心跳告诉nacos服务端自己还存活,拿到请求的返回结果 JsonNode result = serverProxy.sendBeat(beatInfo, BeatReactor.this.lightBeatEnabled); long interval = result.get("clientBeatInterval").asLong(); boolean lightBeatEnabled = false; if (result.has(CommonParams.LIGHT_BEAT_ENABLED)) { lightBeatEnabled = result.get(CommonParams.LIGHT_BEAT_ENABLED).asBoolean(); } BeatReactor.this.lightBeatEnabled = lightBeatEnabled; if (interval > 0) { // 设置下一次发送心跳的时间 nextTime = interval; } int code = NamingResponseCode.OK; if (result.has(CommonParams.CODE)) { code = result.get(CommonParams.CODE).asInt(); } if (code == NamingResponseCode.RESOURCE_NOT_FOUND) { // 如果返回找不到服务,说明服务还没有注册上去,先注册服务 Instance instance = new Instance(); instance.setPort(beatInfo.getPort()); instance.setIp(beatInfo.getIp()); instance.setWeight(beatInfo.getWeight()); instance.setMetadata(beatInfo.getMetadata()); instance.setClusterName(beatInfo.getCluster()); instance.setServiceName(beatInfo.getServiceName()); instance.setInstanceId(instance.getInstanceId()); instance.setEphemeral(true); try { serverProxy.registerService(beatInfo.getServiceName(), NamingUtils.getGroupName(beatInfo.getServiceName()), instance); } catch (Exception ignore) { } } } catch (NacosException ex) { NAMING_LOGGER.error("[CLIENT-BEAT] failed to send beat: {}, code: {}, msg: {}", JacksonUtils.toJson(beatInfo), ex.getErrCode(), ex.getErrMsg()); } catch (Exception unknownEx) { NAMING_LOGGER.error("[CLIENT-BEAT] failed to send beat: {}, unknown exception msg: {}", JacksonUtils.toJson(beatInfo), unknownEx.getMessage(), unknownEx); } finally { // 启动下一个延时任务 executorService.schedule(new BeatTask(beatInfo), nextTime, TimeUnit.MILLISECONDS); } } }
/** * 发送心跳 */ public JsonNode sendBeat(BeatInfo beatInfo, boolean lightBeatEnabled) throws NacosException { if (NAMING_LOGGER.isDebugEnabled()) { NAMING_LOGGER.debug("[BEAT] {} sending beat to server: {}", namespaceId, beatInfo.toString()); } Map<String, String> params = new HashMap<String, String>(8); Map<String, String> bodyMap = new HashMap<String, String>(2); if (!lightBeatEnabled) { bodyMap.put("beat", JacksonUtils.toJson(beatInfo)); } params.put(CommonParams.NAMESPACE_ID, namespaceId); params.put(CommonParams.SERVICE_NAME, beatInfo.getServiceName()); params.put(CommonParams.CLUSTER_NAME, beatInfo.getCluster()); params.put("ip", beatInfo.getIp()); params.put("port", String.valueOf(beatInfo.getPort())); // 这里向nacos服务端发送心跳请求 String result = reqApi(UtilAndComs.nacosUrlBase + "/instance/beat", params, bodyMap, HttpMethod.PUT); return JacksonUtils.toObj(result); }
服务端心跳接口
(待完成)
对于Nacos客户端,就只需要定时发送心跳即可,在服务端还有一个健康监测机制,会定时检查服务的健康情况,如果15秒没有收到客户端的心跳,就会将它的healthy属性置位false。心跳机制的简单流程图如下:
注册服务
看完心跳,回到之前的注册流程,这里调用的注册接口和发送心跳的延时任务中调用的注册接口是一样的,都是 NamingProxy.registerService(String serviceName, String groupName, Instance instance) 。
/** * 注册一个实例 */ public void registerService(String serviceName, String groupName, Instance instance) throws NacosException { NAMING_LOGGER.info("[REGISTER-SERVICE] {} registering service {} with instance: {}", namespaceId, serviceName, instance); final Map<String, String> params = new HashMap<String, String>(16); params.put(CommonParams.NAMESPACE_ID, namespaceId); params.put(CommonParams.SERVICE_NAME, serviceName); params.put(CommonParams.GROUP_NAME, groupName); params.put(CommonParams.CLUSTER_NAME, instance.getClusterName()); params.put("ip", instance.getIp()); params.put("port", String.valueOf(instance.getPort())); params.put("weight", String.valueOf(instance.getWeight())); params.put("enable", String.valueOf(instance.isEnabled())); params.put("healthy", String.valueOf(instance.isHealthy())); params.put("ephemeral", String.valueOf(instance.isEphemeral())); params.put("metadata", JacksonUtils.toJson(instance.getMetadata())); // 请求服务端的服务注册接口 reqApi(UtilAndComs.nacosUrlInstance, params, HttpMethod.POST); }
注册机制总结
- Nacos客户端通过监听Spring的WebServerInitializedEvent事件发起注册流程,调用服务端API发起注册和定时心跳
- Nacos服务端收到请求,做以下三件事:
- 构建一个Service对象保存到ConcurrentHashMap集合中
- 使用定时任务对当前服务下的所有实例建立心跳检测机制
- 基于数据一致性协议服务数据进行同步
服务地址动态感知
服务端的地址查询
Nacos服务端有一个Open API接口和一个SDK接口,能够查询出服务提供者的列表,分别是:
- Open API接口:/v1/ns/instance/list
/** * 获取入参指定的所有实例信息 */ @GetMapping("/list") @Secured(action = ActionTypes.READ) public Object list(HttpServletRequest request) throws Exception { String namespaceId = WebUtils.optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID); String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME); NamingUtils.checkServiceNameFormat(serviceName); String agent = WebUtils.getUserAgent(request); String clusters = WebUtils.optional(request, "clusters", StringUtils.EMPTY); String clientIP = WebUtils.optional(request, "clientIP", StringUtils.EMPTY); int udpPort = Integer.parseInt(WebUtils.optional(request, "udpPort", "0")); boolean healthyOnly = Boolean.parseBoolean(WebUtils.optional(request, "healthyOnly", "false")); String app = WebUtils.optional(request, "app", StringUtils.EMPTY); Subscriber subscriber = new Subscriber(clientIP + ":" + udpPort, agent, app, clientIP, namespaceId, serviceName, udpPort, clusters); // 当所有集群都升级到2.0时,调用InstanceOperatorClientImpl的接口,否则调用InstanceOperatorServiceImpl的接口 return getInstanceOperator().listInstance(namespaceId, serviceName, subscriber, clusters, healthyOnly); }
- SDK接口:com.alibaba.nacos.client.naming.NacosNamingService#selectInstances(java.lang.String, java.lang.String, java.util.List<java.lang.String>, boolean, boolean)
/** * 在指定的服务集群中获取合适的实例 */ public List<Instance> selectInstances(String serviceName, String groupName, List<String> clusters, boolean healthy, boolean subscribe) throws NacosException { ServiceInfo serviceInfo; String clusterString = StringUtils.join(clusters, ","); if (subscribe) { serviceInfo = serviceInfoHolder.getServiceInfo(serviceName, groupName, clusterString); if (null == serviceInfo) { serviceInfo = clientProxy.subscribe(serviceName, groupName, clusterString); } } else { serviceInfo = clientProxy.queryInstancesOfService(serviceName, groupName, clusterString, 0, false); } return selectInstances(serviceInfo, healthy); } private List<Instance> selectInstances(ServiceInfo serviceInfo, boolean healthy) { List<Instance> list; if (serviceInfo == null || CollectionUtils.isEmpty(list = serviceInfo.getHosts())) { return new ArrayList<Instance>(); } Iterator<Instance> iterator = list.iterator(); while (iterator.hasNext()) { Instance instance = iterator.next(); if (healthy != instance.isHealthy() || !instance.isEnabled() || instance.getWeight() <= 0) { iterator.remove(); } } return list; }
动态感知
首先我们在客户端找一下调用/instance/list
接口的地方,唯一调用的地方是com.alibaba.nacos.client.naming.net.NamingProxy#queryList
,再定位一下调用这个方法的地方,根据调用者的方法名,筛选出了com.alibaba.nacos.client.naming.core.HostReactor#updateService
和com.alibaba.nacos.client.naming.core.HostReactor#refreshOnly
两处可能是用于刷新服务列表的方法,最终定位到了com.alibaba.nacos.client.naming.core.HostReactor.UpdateTask
这个Runnable实现类。这下就好办了,一层层向上溯源,最后发现了一个Spring配置类:com.alibaba.cloud.nacos.discovery.NacosDiscoveryClientConfiguration
@Configuration(proxyBeanMethods = false) @ConditionalOnDiscoveryEnabled @ConditionalOnBlockingDiscoveryEnabled @ConditionalOnNacosDiscoveryEnabled @AutoConfigureBefore({ SimpleDiscoveryClientAutoConfiguration.class, CommonsClientAutoConfiguration.class }) @AutoConfigureAfter(NacosDiscoveryAutoConfiguration.class) public class NacosDiscoveryClientConfiguration { @Bean public DiscoveryClient nacosDiscoveryClient( NacosServiceDiscovery nacosServiceDiscovery) { return new NacosDiscoveryClient(nacosServiceDiscovery); } @Bean @ConditionalOnMissingBean @ConditionalOnProperty(value = "spring.cloud.nacos.discovery.watch.enabled", matchIfMissing = true) public NacosWatch nacosWatch(NacosServiceManager nacosServiceManager, NacosDiscoveryProperties nacosDiscoveryProperties) { return new NacosWatch(nacosServiceManager, nacosDiscoveryProperties); } }
从这个配置类中可以看到,当spring.cloud.nacos.discovery.watch.enabled
配置为true时,会加载一个NacosWatch
类的bean。在 spring-cloud-starter-alibaba-nacos-discovery 模块的配置文件 additional-spring-configuration-metadata.json 中可以看到这个配置项的相关信息,默认是为true,也就是默认会开启动态感知的功能。
接下来看NacosWatch
,贴一点关键代码。从它实现了ApplicationEventPublisherAware
可以看出,这也是一个事件的发布者,其中有一个listenerMap
保存了listener,还实现了SmartLifecycle
,说明了它和容器的生命周期息息相关,实现了DisposableBean
说明和bean的生命周期也有关,会在bean销毁时自动优雅地销毁。
public class NacosWatch implements ApplicationEventPublisherAware, SmartLifecycle, DisposableBean { // listener的集合 private Map<String, EventListener> listenerMap = new ConcurrentHashMap<>(16); /** * 这个方法是实现的SmartLifecycle中的方法,作用是在Spring的bean加载完成后,执行onFinish()方法时,自动执行这个类的start()方法 */ @Override public boolean isAutoStartup() { return true; } @Override public void stop(Runnable callback) { this.stop(); callback.run(); } @Override public void start() { // 这里用CAS确保该方法只被执行一次 if (this.running.compareAndSet(false, true)) { // 新建事件监听listener,监听所有事件,当事件为NamingEvent时,更新当前实例 EventListener eventListener = listenerMap.computeIfAbsent(buildKey(), event -> new EventListener() { @Override public void onEvent(Event event) { if (event instanceof NamingEvent) { List<Instance> instances = ((NamingEvent) event) .getInstances(); Optional<Instance> instanceOptional = selectCurrentInstance( instances); instanceOptional.ifPresent(currentInstance -> { resetIfNeeded(currentInstance); }); } } }); NamingService namingService = nacosServiceManager .getNamingService(properties.getNacosProperties()); try { // 注册监听器 namingService.subscribe(properties.getService(), properties.getGroup(), Arrays.asList(properties.getClusterName()), eventListener); } catch (Exception e) { log.error("namingService subscribe failed, properties:{}", properties, e); } this.watchFuture = this.taskScheduler.scheduleWithFixedDelay( this::nacosServicesWatch, this.properties.getWatchDelay()); } } @Override public void destroy() { this.stop(); } }
点进subscribe
方法往下看
@Override public void subscribe(String serviceName, String groupName, List<String> clusters, EventListener listener) throws NacosException { hostReactor.subscribe(NamingUtils.getGroupedName(serviceName, groupName), StringUtils.join(clusters, ","), listener); }
/** * 订阅实例变化的事件 */ public void subscribe(String serviceName, String clusters, EventListener eventListener) { // 这一步是将listener放到map里去,每个服务都有一系列listener notifier.registerListener(serviceName, clusters, eventListener); // 这里先获取一次服务列表,之后开启定时任务刷新服务列表 getServiceInfo(serviceName, clusters); } public ServiceInfo getServiceInfo(final String serviceName, final String clusters) { NAMING_LOGGER.debug("failover-mode: " + failoverReactor.isFailoverSwitch()); String key = ServiceInfo.getKey(serviceName, clusters); if (failoverReactor.isFailoverSwitch()) { return failoverReactor.getService(key); } ServiceInfo serviceObj = getServiceInfo0(serviceName, clusters); if (null == serviceObj) { serviceObj = new ServiceInfo(serviceName, clusters); serviceInfoMap.put(serviceObj.getKey(), serviceObj); updatingMap.put(serviceName, new Object()); // 立即更新服务列表 updateServiceNow(serviceName, clusters); updatingMap.remove(serviceName); } else if (updatingMap.containsKey(serviceName)) { if (UPDATE_HOLD_INTERVAL > 0) { // hold a moment waiting for update finish synchronized (serviceObj) { try { serviceObj.wait(UPDATE_HOLD_INTERVAL); } catch (InterruptedException e) { NAMING_LOGGER .error("[getServiceInfo] serviceName:" + serviceName + ", clusters:" + clusters, e); } } } } // 启动更新服务列表的定时任务 scheduleUpdateIfAbsent(serviceName, clusters); return serviceInfoMap.get(serviceObj.getKey()); } private void updateServiceNow(String serviceName, String clusters) { try { // 更新服务列表 updateService(serviceName, clusters); } catch (NacosException e) { NAMING_LOGGER.error("[NA] failed to update serviceName: " + serviceName, e); } } /** * 定时更新 */ public void scheduleUpdateIfAbsent(String serviceName, String clusters) { if (futureMap.get(ServiceInfo.getKey(serviceName, clusters)) != null) { return; } synchronized (futureMap) { if (futureMap.get(ServiceInfo.getKey(serviceName, clusters)) != null) { return; } // 将更新任务增加到任务列表中 ScheduledFuture<?> future = addTask(new UpdateTask(serviceName, clusters)); futureMap.put(ServiceInfo.getKey(serviceName, clusters), future); } } public synchronized ScheduledFuture<?> addTask(UpdateTask task) { // 启动延时任务 return executor.schedule(task, DEFAULT_DELAY, TimeUnit.MILLISECONDS); }
updateService()
这个方法后面再看,因为到后面其实都是调用这个方法更新的服务列表。先来看看UpdateTask
这个类做了什么。
public class UpdateTask implements Runnable { // 省略部分方法和成员变量... @Override public void run() { // 1000L long delayTime = DEFAULT_DELAY; try { ServiceInfo serviceObj = serviceInfoMap.get(ServiceInfo.getKey(serviceName, clusters)); if (serviceObj == null) { updateService(serviceName, clusters); return; } if (serviceObj.getLastRefTime() <= lastRefTime) { // serviceObj的上次刷新时间小于等于成员变量保存的上次刷新时间,说明上次主动拉取后服务列表没有被更新过 updateService(serviceName, clusters); serviceObj = serviceInfoMap.get(ServiceInfo.getKey(serviceName, clusters)); } else { // 如果serviceName已经被主动push更新了,那我们不能覆盖它,因为push的数据和我们主动拉取的数据可能不一样 refreshOnly(serviceName, clusters); } lastRefTime = serviceObj.getLastRefTime(); if (!notifier.isSubscribed(serviceName, clusters) && !futureMap .containsKey(ServiceInfo.getKey(serviceName, clusters))) { // abort the update task NAMING_LOGGER.info("update task is stopped, service:" + serviceName + ", clusters:" + clusters); return; } if (CollectionUtils.isEmpty(serviceObj.getHosts())) { incFailCount(); return; } delayTime = serviceObj.getCacheMillis(); resetFailCount(); } catch (Throwable e) { // 将failCount++ incFailCount(); NAMING_LOGGER.warn("[NA] failed to update serviceName: " + serviceName, e); } finally { // 计划下一次延时任务,每失败一次,延时时间翻倍,默认是60秒 executor.schedule(this, Math.min(delayTime << failCount, DEFAULT_DELAY * 60), TimeUnit.MILLISECONDS); } } }
可以看到,这个延时任务首先从serviceInfoMap中保存的服务信息,根据情况更新服务列表或者单纯做刷新,最后开启下一轮的延时任务。现在再来看看更新服务列表的方法updateService()
。
/** * 更新服务列表 */ public void updateService(String serviceName, String clusters) throws NacosException { ServiceInfo oldService = getServiceInfo0(serviceName, clusters); try { // 请求API接口,拿到json格式的结果后更新本地服务列表 String result = serverProxy.queryList(serviceName, clusters, pushReceiver.getUdpPort(), false); if (StringUtils.isNotEmpty(result)) { // 这个太长了就不看了,就是更新一下serviceInfoMap,有必要则更新心跳信息,最后发布一个InstancesChangeEvent事件,让之前注册的那个listener执行更新操作 processServiceJson(result); } } finally { if (oldService != null) { synchronized (oldService) { oldService.notifyAll(); } } } }
updateService()
方法其实就是请求了服务端的API,然后根据返回结果更新服务提供者的信息。
再从上面可以看到,当 serviceObj.getLastRefTime() <= lastRefTime 时,不会调用updateService()
方法,而会调用refreshOnly()
,这个方法没有对本地的服务列表进行更新,只是调用了一下服务端的接口。
/** * 只做刷新 */ public void refreshOnly(String serviceName, String clusters) { try { serverProxy.queryList(serviceName, clusters, pushReceiver.getUdpPort(), false); } catch (Exception e) { NAMING_LOGGER.error("[NA] failed to update serviceName: " + serviceName, e); } }