nova evacuate代码分析

基于 openstack T版本代码,对nova evacuate宕机疏散进行代码梳理

1、nova evacuate 宕机疏散对外呈现的命令行

nova evacuate
功能:从故障的计算节点疏散撤离虚拟机
用法:nova evacuate [--password <password>] [--on-shared-storage] <server> [<host>]
参数:
<server> 故障节点上的虚拟机
<host> 目标节点的名称或ID。如果没有指定节点,则nova scheduler调度器随机选择选择一个可用的。
--password <password> 设置已疏散虚机上的密码。不适用于共享存储虚机
--on-shared-storage 指定服务器文件是否位于共享存储器上
--password参数可以在evacuate时传递给虚机的password,如果不使用该参数,则evacuate虚机后,会随机生成一个password

2、nova evacuate代码梳理

一、nova-api服务阶段
nova-api服务执行api目录下代码阶段
1)下发 nova evacuate 命令时,使用post方法,给nova-api服务发送http请求,http body体里面,使用的action动作为 evacuate
2)获取http请求body体里面的内容,从而获取 host,force,password,on_shared_storage,这些参数的值
3)如果指定了host 参数的值,那么首先判断该host节点是否存在,如果不存在的话,那么抛出找不到该Host的异常,存在的话,执行第四步
4)如果指定的Host,与虚机所在的host相同,那么抛出异常

D:\tran_code\nova_v1\nova\api\openstack\compute\evacuate.py
    @extensions.expected_errors((400, 404, 409))
    @wsgi.action('evacuate')
    @validation.schema(evacuate.evacuate, "2.0", "2.13")
    @validation.schema(evacuate.evacuate_v214, "2.14", "2.28")
    @validation.schema(evacuate.evacuate_v2_29, "2.29")
    def _evacuate(self, req, id, body):
        """Permit admins to evacuate a server from a failed host
        to a new one.
        """
        context = req.environ["nova.context"]
        instance = common.get_instance(self.compute_api, context, id)
        context.can(evac_policies.BASE_POLICY_NAME,
                    target={'user_id': instance.user_id,
                            'project_id': instance.project_id})

        evacuate_body = body["evacuate"]
        host = evacuate_body.get("host")
        force = None

        on_shared_storage = self._get_on_shared_storage(req, evacuate_body)

        if api_version_request.is_supported(req, min_version='2.29'):
            force = body["evacuate"].get("force", False)
            force = strutils.bool_from_string(force, strict=True)
            if force is True and not host:
                message = _("Can't force to a non-provided destination")
                raise exc.HTTPBadRequest(explanation=message)
        if api_version_request.is_supported(req, min_version='2.14'):
            password = self._get_password_v214(req, evacuate_body)
        else:
            password = self._get_password(req, evacuate_body,
                                          on_shared_storage)

        if host is not None:
            try:
                self.host_api.service_get_by_compute_host(context, host)
            except (exception.ComputeHostNotFound,
                    exception.HostMappingNotFound):
                msg = _("Compute host %s not found.") % host
                raise exc.HTTPNotFound(explanation=msg)

        if instance.host == host:
            msg = _("The target host can't be the same one.")
            raise exc.HTTPBadRequest(explanation=msg)

        try:
            self.compute_api.evacuate(context, instance, host,
                                      on_shared_storage, password, force)
        except exception.InstanceUnknownCell as e:
            raise exc.HTTPNotFound(explanation=e.format_message())
        except exception.InstanceInvalidState as state_error:
            common.raise_http_conflict_for_instance_invalid_state(state_error,
                    'evacuate', id)
        except exception.ComputeServiceInUse as e:
            raise exc.HTTPBadRequest(explanation=e.format_message())

        if (not api_version_request.is_supported(req, min_version='2.14') and
                CONF.api.enable_instance_password):
            return {'adminPass': password}
        else:
            return None

 

nova-api服务执行compute目录模块代码阶段
1)获取虚机的host信息
2)判断虚机Host的nova-compue服务是否处于Up状态,如果是Up状态,那么就抛出一个异常(宕机疏散,只有在虚机所在节点宕机的情况下,才进行宕机疏散的),
不在执行后续的操作,如果非up,那么执行3以下步骤
3)创建Migration表,来记录该虚机宕机疏散的操作信息,便于后续服务,获取该虚机的信息
4)如果指定了特定host主机,那么把这个host更新到migration的dest_compute字段里面
5)根据虚机的 uuid,获取虚机的request_spec信息
6)如果指定了目标主机但是不强制进行宕机疏散的话,把host参数置为none,由nova-scheduler随机选择一个
这个函数,只用了instance, host, on_shared_storage,admin_password=None, force=None,recreate=True这五个参数,
其他的参数没有用,使用默认值,传递给nova conductor服务的rebuild_instance方法

D:\tran_code\nova_v1\nova\compute\api.py
    def evacuate(self, context, instance, host, on_shared_storage,
                 admin_password=None, force=None):
		参数介绍
		进行宕机疏散的虚拟机
		目标节点,没有设置的话,则为空
		虚机是不是共享存储标识
		新建创建虚机的登录密码
		强制宕机疏导到指定目标节点上标识
        """Running evacuate to target host.

        Checking vm compute host state, if the host not in expected_state,
        raising an exception.

        :param instance: The instance to evacuate
        :param host: Target host. if not set, the scheduler will pick up one
        :param on_shared_storage: True if instance files on shared storage
        :param admin_password: password to set on rebuilt instance
        :param force: Force the evacuation to the specific host target

        """
        LOG.debug('vm evacuation scheduled', instance=instance)
        inst_host = instance.host-----获取虚机的所在host节点
        service = objects.Service.get_by_compute_host(context, inst_host)----根据虚机的host来获取其nova-compute service信息
        if self.servicegroup_api.service_is_up(service):------对虚机所在的nova-compute节点状态进行判断,宕机疏散是在虚机所在的nova-compute节点为down的情况下,疏散的,因此如果
            LOG.error('Instance compute service state on %s '-----虚机所在的nova-compute服务为up会抛出异常
                      'expected to be down, but it was up.', inst_host)
            raise exception.ComputeServiceInUse(host=inst_host)

        instance.task_state = task_states.REBUILDING----设置虚机的任务状态为rebuiding,重建状态
        instance.save(expected_task_state=[None])------把这个任务保存一下
        self._record_action_start(context, instance, instance_actions.EVACUATE)----记录该虚机的动作

        # NOTE(danms): Create this as a tombstone for the source compute
        # to find and cleanup. No need to pass it anywhere else.
        migration = objects.Migration(context,------创建这个migration记录,是为源计算节点创建一个醒目标志,为了找到它及以后清理,这个参数不会通过参数的形式,下发下去
                                      source_compute=instance.host,
                                      source_node=instance.node,
                                      instance_uuid=instance.uuid,
                                      status='accepted',
                                      migration_type='evacuation')-----迁移的类型为宕机疏散
        if host:----如果指定了目标主机,那么把目标主机记录到Migration表里面
            migration.dest_compute = host
        migration.create()

        compute_utils.notify_about_instance_usage(
            self.notifier, context, instance, "evacuate")

        try:
            request_spec = objects.RequestSpec.get_by_instance_uuid(-----根据虚机的uuid,来获取虚机的request_spec信息,如果没有找到的话,那么就设置为空
                context, instance.uuid)
        except exception.RequestSpecNotFound:
            # Some old instances can still have no RequestSpec object attached
            # to them, we need to support the old way
            request_spec = None

        # NOTE(sbauza): Force is a boolean by the new related API version
        if force is False and host:------如果不强制进行宕机疏散并且还强制指定了特定的host主机,那么就走这段逻辑,其他情况是,不走,
            nodes = objects.ComputeNodeList.get_all_by_host(context, host)------根据目标主机,获取所有的node服务节点信息
            # NOTE(sbauza): Unset the host to make sure we call the scheduler
            host = None-----虽然形参赋值了,但是在这里把host赋值为空,让它走nova-scheduler调度
            # FIXME(sbauza): Since only Ironic driver uses more than one
            # compute per service but doesn't support evacuations,
            # let's provide the first one.
            target = nodes[0]-------选择返回Node 列表中的第一个node节点信息
            if request_spec:
                # TODO(sbauza): Hydrate a fake spec for old instances not yet
                # having a request spec attached to them (particularly true for
                # cells v1). For the moment, let's keep the same behaviour for
                # all the instances but provide the destination only if a spec
                # is found.
                destination = objects.Destination(
                    host=target.host,
                    node=target.hypervisor_hostname
                )
                request_spec.requested_destination = destination

        return self.compute_task_api.rebuild_instance(context,
                       instance=instance,
                       new_pass=admin_password,
                       injected_files=None,
                       image_ref=None,
                       orig_image_ref=None,
                       orig_sys_metadata=None,
                       bdms=None,
                       recreate=True,
                       on_shared_storage=on_shared_storage,
                       host=host,
                       request_spec=request_spec,
                       )

 

nova-api服务调用nova-conductor api模块阶段,从进一步发送rpc请求给nova-conductor服务阶段

D:\tran_code\nova_v1\nova\conductor\api.py-----nova rebuild和nova evacuate在compute.api.py文件中开始走这个相同的函数
    def rebuild_instance(self, context, instance, orig_image_ref, image_ref,
                         injected_files, new_pass, orig_sys_metadata,
                         bdms, recreate=False, on_shared_storage=False,
                         preserve_ephemeral=False, host=None,
                         request_spec=None, kwargs=None):
        # kwargs unused but required for cell compatibility
        self.conductor_compute_rpcapi.rebuild_instance(context,
                instance=instance,
                new_pass=new_pass,
                injected_files=injected_files,
                image_ref=image_ref,
                orig_image_ref=orig_image_ref,
                orig_sys_metadata=orig_sys_metadata,
                bdms=bdms,
                recreate=recreate,
                on_shared_storage=on_shared_storage,
                preserve_ephemeral=preserve_ephemeral,
                host=host,
                request_spec=request_spec)

三、nova-conductor服务阶段
nova-conductor阶段 manager.py阶段
1)根据虚机的uuid,获取虚机的migrantion表信息
2)对传入的Host进行不同逻辑判断
3)host有值的情况:
3.1 host有值的两种情景:一种是在虚机原始的host上,使用虚机原始的镜像进行重建rebuild
另一种是指定特定的主机,并且进行强制的宕机疏散
这两种情况下,node这个参数是为空的,
3.2 host无值的情况
三种情景是,1)要么没有指定主机进行宕机疏散; 2)要么指定主机了,但是没有进行强制宕机疏散;3)要么就是在虚机所在主机上,使用新的镜像进行rebuild重建虚机
在nova-scheduler的情况下,instance的host是会被排除的,避免选择到这个相同的主机,这种情况下,选择目标主机后,
host和Node是非空的,host用于设置消息的目标主机路由参数,node用于后续函数中
4)给nova-compute服务发送rpc请求

D:\tran_code\nova_v1\nova\conductor\manager.py
    @targets_cell
    def rebuild_instance(self, context, instance, orig_image_ref, image_ref,
                         injected_files, new_pass, orig_sys_metadata,
                         bdms, recreate, on_shared_storage,
                         preserve_ephemeral=False, host=None,
                         request_spec=None):

        with compute_utils.EventReporter(context, 'rebuild_server', instance.uuid):--------报告虚机的rebuild事件
            node = limits = None

            try:
                migration = objects.Migration.get_by_instance_and_status(------根据虚机的Uuid,来获取到虚机的migration表信息,如果没有找到,那么抛异常
                    context, instance.uuid, 'accepted')
            except exception.MigrationNotFoundByStatus:
                LOG.debug("No migration record for the rebuild/evacuate "
                          "request.", instance=instance)
                migration = None

            # The host variable is passed in two cases:-------有两种情况,host变量是被传递的,
            # 1. rebuild - the instance.host is passed to rebuild on the----第一种是虚机的host被传递过去,要在虚机所在的主机上进行重建,这个会跳过调度器
            #       same host and bypass the scheduler *unless* a new image-----虚机重建有两种情况,一种是虚机使用原始的镜像,另一种是虚机使用非原始镜像
            #       was specified-----host主机变量下发被使用的话,并且使用原始的镜像重建,
            # 2. evacuate with specified host and force=True - the specified-----第二种情况,在指定特定的目标主机,并且强制疏散的情况下,既然是强制了,那么就会不通过调度器
            #       host is passed and is meant to bypass the scheduler.-----剩下的情况指定主机,但是不强制宕机疏散,那么host也不会被传递
            # NOTE(mriedem): This could be a lot more straight-forward if we
            # had separate methods for rebuild and evacuate...
            if host:-------指定目标主机的情况下,rebuild时,如果没有指定新的镜像,那么host会被赋予instance.host值
                # We only create a new allocation on the specified host if
                # we're doing an evacuate since that is a move operation.
                if host != instance.host:---------指定的目的主机与虚机的host不一样的情况下,是要进行资源申请的,如果指定的主机,与instance.host相同,那么不需要进行资源申请,
                    # If a destination host is forced for evacuate, create ---这种情景下,同一节点上进行重建,如果指定的虚机的host,宕机疏散走不到这
                    # allocations against it in Placement.
                    self._allocate_for_evacuate_dest_host(-------这个函数可用要改造
                        context, instance, host, request_spec)
            else:
                # At this point, the user is either:---此时,用户要么是
                #
                # 1. Doing a rebuild on the same host (not evacuate) and-----在相同的主机上使用新的镜像进行重建
                #    specified a new image.
                # 2. Evacuating and specified a host but are not forcing it.----指定特定的主机,进行宕机疏散,但是不强制
                #
                # In either case, the API passes host=None but sets up the----在这两种情况下,API传递host=None,但是在RequestSpec.requested_destination字段设置了特定的host
                # RequestSpec.requested_destination field for the specified
                # host.
                if not request_spec:-----没有指定request_spec的情况下,根据虚机的镜像信息,来构造image元数据,并且来构造request_spec信息
                    # NOTE(sbauza): We were unable to find an original
                    # RequestSpec object - probably because the instance is old
                    # We need to mock that the old way
                    # TODO(sbauza): Provide directly the RequestSpec object
                    # when _set_vm_state_and_notify() accepts it
                    filter_properties = {'ignore_hosts': [instance.host]}-----没有指定主机的情况下,走调度,那么是需要过滤掉虚机的host的
                    # build_request_spec expects a primitive image dict
                    image_meta = nova_object.obj_to_primitive(
                        instance.image_meta)
                    request_spec = scheduler_utils.build_request_spec(
                            context, image_meta, [instance])
                    request_spec = objects.RequestSpec.from_primitives(
                        context, request_spec, filter_properties)
                elif recreate:------宕机疏散是要走这个的
                    # NOTE(sbauza): Augment the RequestSpec object by excluding----通过在RequestSpec中增加source host来排除调度器调度到它
                    # the source host for avoiding the scheduler to pick it
                    request_spec.ignore_hosts = [instance.host]--------排除掉虚机的host
                    # NOTE(sbauza): Force_hosts/nodes needs to be reset
                    # if we want to make sure that the next destination
                    # is not forced to be the original host
                    request_spec.reset_forced_destinations()
                try:
                    request_spec.ensure_project_id(instance)
                    hosts = self._schedule_instances(context, request_spec,------根据request_spec来调度选择一个虚机
                                                     [instance.uuid])
                    host_dict = hosts.pop(0)
                    host, node, limits = (host_dict['host'],----已经选择好了目标主机
                                          host_dict['nodename'],
                                          host_dict['limits'])
                except exception.NoValidHost as ex:
                    if migration:
                        migration.status = 'error'
                        migration.save()
                    # Rollback the image_ref if a new one was provided (this
                    # only happens in the rebuild case, not evacuate).
                    if orig_image_ref and orig_image_ref != image_ref:
                        instance.image_ref = orig_image_ref
                        instance.save()
                    request_spec = request_spec.to_legacy_request_spec_dict()
                    with excutils.save_and_reraise_exception():
                        self._set_vm_state_and_notify(context, instance.uuid,
                                'rebuild_server',
                                {'vm_state': vm_states.ERROR,
                                 'task_state': None}, ex, request_spec)
                        LOG.warning("No valid host found for rebuild",
                                    instance=instance)
                        compute_utils.add_instance_fault_from_exc(context,
                            instance, ex, sys.exc_info())
                except exception.UnsupportedPolicyException as ex:
                    if migration:
                        migration.status = 'error'
                        migration.save()
                    # Rollback the image_ref if a new one was provided (this
                    # only happens in the rebuild case, not evacuate).
                    if orig_image_ref and orig_image_ref != image_ref:
                        instance.image_ref = orig_image_ref
                        instance.save()
                    request_spec = request_spec.to_legacy_request_spec_dict()
                    with excutils.save_and_reraise_exception():
                        self._set_vm_state_and_notify(context, instance.uuid,
                                'rebuild_server',
                                {'vm_state': vm_states.ERROR,
                                 'task_state': None}, ex, request_spec)
                        LOG.warning("Server with unsupported policy "
                                    "cannot be rebuilt", instance=instance)
                        compute_utils.add_instance_fault_from_exc(context,
                            instance, ex, sys.exc_info())

            compute_utils.notify_about_instance_usage(
                self.notifier, context, instance, "rebuild.scheduled")

            instance.availability_zone = (
                availability_zones.get_host_availability_zone(
                    context, host))

            self.compute_rpcapi.rebuild_instance(context,------这一部分开始进行虚机的重建操作
                    instance=instance,
                    new_pass=new_pass,
                    injected_files=injected_files,
                    image_ref=image_ref,
                    orig_image_ref=orig_image_ref,
                    orig_sys_metadata=orig_sys_metadata,
                    bdms=bdms,
                    recreate=recreate,
                    on_shared_storage=on_shared_storage,
                    preserve_ephemeral=preserve_ephemeral,
                    migration=migration,------此时传递了migration这个结构体
                    host=host, node=node, limits=limits)

 

调用nova-compute服务的rpc.py服务

D:\tran_code\nova_v1\nova\compute\rpcapi.py
    def rebuild_instance(self, ctxt, instance, new_pass, injected_files,
            image_ref, orig_image_ref, orig_sys_metadata, bdms,
            recreate=False, on_shared_storage=False, host=None, node=None,
            preserve_ephemeral=False, migration=None, limits=None,
            kwargs=None):
        # NOTE(danms): kwargs is only here for cells compatibility, don't
        # actually send it to compute
        extra = {'preserve_ephemeral': preserve_ephemeral,
                 'migration': migration,
                 'scheduled_node': node,
                 'limits': limits}
        version = '4.5'
        client = self.router.client(ctxt)
        if not client.can_send_version(version):
            version = '4.0'
            extra.pop('migration')
            extra.pop('scheduled_node')
            extra.pop('limits')
        cctxt = client.prepare(server=_compute_host(host, instance),------把消息的目标地址更改一下,host 主机的参数在这里使用了,没有下发到后续的函数中,作为形参
                version=version)
        cctxt.cast(ctxt, 'rebuild_instance',
                   instance=instance, new_pass=new_pass,
                   injected_files=injected_files, image_ref=image_ref,
                   orig_image_ref=orig_image_ref,
                   orig_sys_metadata=orig_sys_metadata, bdms=bdms,
                   recreate=recreate, on_shared_storage=on_shared_storage,
                   **extra)

 

四、在目标节点的nova-compute 服务阶段
整个流程的调用的主要函数流程如下:
rebuild_instance---->_do_rebuild_instance_with_claim--->_do_rebuild_instance--->_rebuild_default_impl
nova-compute阶段 manager.py阶段
1)根据recreate值来区分是nova evacuate宕机疏散操作还是nova rebuild操作
2)recreate为真的情况下,nova evacuate宕机疏散,recreate为假的情况下,nova rebuild操作
3)根据选择的sceduler node 来对目标节点进行资源申请
4)获取虚机的镜像信息引用
5)根据虚机的uuid,读取 block_device_mapping 表来获取虚机的块设备信息,
6)获取虚机的网络信息
7)把虚机的块设备进行卸载
8)因为libvirt没有实现rebuild驱动,所以调用了_rebuild_default_impl方法来实现,宕机疏散和rebuild重建
9)如果是宕机疏散nova evacuate操作,那么就在目标节点上,调用spawn驱动,进行新建操作
10)如果是rebuild操作,那么先在目标节点上destory虚机,然后再调用spawn驱动,进行新建操作

D:\tran_code\nova_v1\nova\compute\manager.py
    def rebuild_instance(self, context, instance, orig_image_ref, image_ref,
                         injected_files, new_pass, orig_sys_metadata,
                         bdms, recreate, on_shared_storage=None,
                         preserve_ephemeral=False, migration=None,
                         scheduled_node=None, limits=None):
        """Destroy and re-make this instance.-----销毁这台虚机并且重新制作这台虚机

        A 'rebuild' effectively purges all existing data from the system and-----重建操作有效的清除来自系统的所有数据,使用给定的metadata和个性信息来重建虚机
        remakes the VM with given 'metadata' and 'personalities'.

        :param context: `nova.RequestContext` object
        :param instance: Instance object
        :param orig_image_ref: Original image_ref before rebuild----重建之前的原始镜像引用
        :param image_ref: New image_ref for rebuild-----用于重建的新镜像
        :param injected_files: Files to inject-----注入的文件
        :param new_pass: password to set on rebuilt instance
        :param orig_sys_metadata: instance system metadata from pre-rebuild-----用于重建之前的原始系统数据
        :param bdms: block-device-mappings to use for rebuild----用于重建的块设备映射
        :param recreate: True if the instance is being recreated (e.g. the
            hypervisor it was on failed) - cleanup of old state will be
            skipped.
        :param on_shared_storage: True if instance files on shared storage.
                                  If not provided then information from the
                                  driver will be used to decide if the instance
                                  files are available or not on the target host
        :param preserve_ephemeral: True if the default ephemeral storage
                                   partition must be preserved on rebuild
        :param migration: a Migration object if one was created for this
                          rebuild operation (if it's a part of evacuate)
        :param scheduled_node: A node of the host chosen by the scheduler. If a-------调度器选择的node,这个node是Hyperversor node;如果一个主机被用户指定了,那么这个node设置为空
                               host was specified by the user, this will be
                               None
        :param limits: Overcommit limits set by the scheduler. If a host was------被scheduler设置的超额提交限制。如果用户指定了一个特定的主机,那么这个值为空
                       specified by the user, this will be None
        """
        context = context.elevated()

        LOG.info("Rebuilding instance", instance=instance)

        if recreate:-----这是一个宕机疏散到新的host上,因此我们需要执行一个资源链
            # This is an evacuation to a new host, so we need to perform a
            # resource claim.
            rt = self._get_resource_tracker()-----对目标主机进行资源追踪
            rebuild_claim = rt.rebuild_claim
        else:-----在相同主机上的重建操作,因此不需要在已有的主机上执行claim资源跟踪
            # This is a rebuild to the same host, so we don't need to make
            # a claim since the instance is already on this host.
            rebuild_claim = claims.NopClaim

        image_meta = {}
        if image_ref:
            image_meta = self.image_api.get(context, image_ref)

        # NOTE(mriedem): On a recreate (evacuate), we need to update------在重建(宕机疏散),我们需要更新虚机的host和node,来反映它要重新创建的目标节点
        # the instance's host and node properties to reflect it's
        # destination node for the recreate.
        if not scheduled_node:------如果Node为空的话,这两种情景,使用相同的镜像在vm.host上重建或者指定固定节点并且强制宕机疏散
            if recreate:
                try:
                    compute_node = self._get_compute_info(context, self.host)-------这个地方需要改造,在指定节点并且强制宕机疏散的情况下
                    scheduled_node = compute_node.hypervisor_hostname
                except exception.ComputeHostNotFound:
                    LOG.exception('Failed to get compute_info for %s',
                                  self.host)
            else:
                scheduled_node = instance.node----对重建的话,直接把它的node赋值给scheduled_node

        with self._error_out_instance_on_exception(context, instance):
            try:
                claim_ctxt = rebuild_claim(
                    context, instance, scheduled_node,
                    limits=limits, image_meta=image_meta,
                    migration=migration)
                self._do_rebuild_instance_with_claim(--------对这个步骤斤进行重点看s1
                    claim_ctxt, context, instance, orig_image_ref,
                    image_ref, injected_files, new_pass, orig_sys_metadata,
                    bdms, recreate, on_shared_storage, preserve_ephemeral,
                    migration)
            except exception.ComputeResourcesUnavailable as e:
                LOG.debug("Could not rebuild instance on this host, not "
                          "enough resources available.", instance=instance)

                # NOTE(ndipanov): We just abort the build for now and leave a
                # migration record for potential cleanup later
                self._set_migration_status(migration, 'failed')
                # Since the claim failed, we need to remove the allocation
                # created against the destination node. Note that we can only
                # get here when evacuating to a destination node. Rebuilding
                # on the same host (not evacuate) uses the NopClaim which will
                # not raise ComputeResourcesUnavailable.
                rt.delete_allocation_for_evacuated_instance(
                    instance, scheduled_node, node_type='destination')
                self._notify_instance_rebuild_error(context, instance, e)

                raise exception.BuildAbortException(
                    instance_uuid=instance.uuid, reason=e.format_message())
            except (exception.InstanceNotFound,
                    exception.UnexpectedDeletingTaskStateError) as e:
                LOG.debug('Instance was deleted while rebuilding',
                          instance=instance)
                self._set_migration_status(migration, 'failed')
                self._notify_instance_rebuild_error(context, instance, e)
            except Exception as e:
                self._set_migration_status(migration, 'failed')
                self._notify_instance_rebuild_error(context, instance, e)
                raise
            else:-----else存在的意义是,执行只有在try代码块没有异常发生时才需要执行的代码,而如果在执行try代码块时捕获了异常就不执行这个else代码块了
                instance.apply_migration_context()
                # NOTE (ndipanov): This save will now update the host and node
                # attributes making sure that next RT pass is consistent since
                # it will be based on the instance and not the migration DB
                # entry.
                instance.host = self.host------主要是更新一些数据库表的字段信息
                instance.node = scheduled_node
                instance.save()
                instance.drop_migration_context()

                # NOTE (ndipanov): Mark the migration as done only after we
                # mark the instance as belonging to this host.
                self._set_migration_status(migration, 'done')----主要是更新migration表里面的信息

    def _do_rebuild_instance_with_claim(self, claim_context, *args, **kwargs):
        """Helper to avoid deep nesting in the top-level method."""

        with claim_context:
            self._do_rebuild_instance(*args, **kwargs)


    def _do_rebuild_instance(self, context, instance, orig_image_ref,
                             image_ref, injected_files, new_pass,
                             orig_sys_metadata, bdms, recreate,
                             on_shared_storage, preserve_ephemeral,
                             migration):
        orig_vm_state = instance.vm_state

        if recreate:------这个是宕机疏散逻辑
            if not self.driver.capabilities["supports_recreate"]:----这个功能属性是要增加的
                raise exception.InstanceRecreateNotSupported

            self._check_instance_exists(context, instance)----检查虚机是不是在对应的driver上已经不存在,如果存在的话,抛出异常,没有存在,则往下执行

            if on_shared_storage is None:----如果 on_shared_storage 设置为空,那么通过驱动取判断虚机是否为共享存储
                LOG.debug('on_shared_storage is not provided, using driver'
                            'information to decide if the instance needs to'
                            'be recreated')
                on_shared_storage = self.driver.instance_on_disk(instance)

            elif (on_shared_storage !=-----为了解决管理员希望实例文件在共享存储上,但不能访问的情况,反之亦然,
                    self.driver.instance_on_disk(instance)):
                # To cover case when admin expects that instance files are
                # on shared storage, but not accessible and vice versa
                raise exception.InvalidSharedStorage(
                        _("Invalid state of instance files on shared"
                            " storage"))

            if on_shared_storage:-----共享存储的话,那么使用存在的盘,这里的共享存储,包括使用共享卷或者共享的目录
                LOG.info('disk on shared storage, recreating using'
                         ' existing disk')
            else:-----非共享存储,获取虚机的镜像信息引用
                image_ref = orig_image_ref = instance.image_ref
                LOG.info("disk not on shared storage, rebuilding from:"
                         " '%s'", str(image_ref))

        if image_ref:-------如果能从虚机的数据里面获取到镜像信息引用的话,那么就从glance中获取该新镜像的元数据
            image_meta = objects.ImageMeta.from_image_ref(
                context, self.image_api, image_ref)
        else:
            image_meta = instance.image_meta----没有从虚机的数据里面获取到镜像信息引用的话,那么表示使用原始的镜像,

        # This instance.exists message should contain the original
        # image_ref, not the new one.  Since the DB has been updated
        # to point to the new one... we have to override it.
        # TODO(jaypipes): Move generate_image_url() into the nova.image.api
        orig_image_ref_url = glance.generate_image_url(orig_image_ref)-------instance中存在的信息应该包含的是原始的镜像引用,而不是新的镜像引用。
        extra_usage_info = {'image_ref_url': orig_image_ref_url}-----所以数据库中应该更新指到新的镜像引用。我们不得不覆盖它
        compute_utils.notify_usage_exists(----通知信息,不重要,忽略
                self.notifier, context, instance,
                current_period=True, system_metadata=orig_sys_metadata,
                extra_usage_info=extra_usage_info)

        # This message should contain the new image_ref
        extra_usage_info = {'image_name': self._get_image_name(image_meta)}
        self._notify_about_instance_usage(context, instance,
                "rebuild.start", extra_usage_info=extra_usage_info)
        # NOTE: image_name is not included in the versioned notification
        # because we already provide the image_uuid in the notification
        # payload and the image details can be looked up via the uuid.
        compute_utils.notify_about_instance_action(
            context, instance, self.host,
            action=fields.NotificationAction.REBUILD,
            phase=fields.NotificationPhase.START)

        instance.power_state = self._get_power_state(context, instance)
        instance.task_state = task_states.REBUILDING
        instance.save(expected_task_state=[task_states.REBUILDING])

        if recreate:------宕机疏散的逻辑,设置虚机的网络状态,这是在不同的主机上执行的
            self.network_api.setup_networks_on_host(
                    context, instance, self.host)
            # For nova-network this is needed to move floating IPs
            # For neutron this updates the host in the port binding
            # TODO(cfriesen): this network_api call and the one above
            # are so similar, we should really try to unify them.
            self.network_api.setup_instance_network_on_host(
                    context, instance, self.host, migration)
            # TODO(mriedem): Consider decorating setup_instance_network_on_host
            # with @base_api.refresh_cache and then we wouldn't need this
            # explicit call to get_instance_nw_info.
            network_info = self.network_api.get_instance_nw_info(context,
                                                                 instance)
        else:
            network_info = instance.get_network_info()----因为rebuild是在相同的主机上rebuild,所以直接获取虚机表里的内容即可

        if bdms is None:----如果bdms参数为空,则从数据库中,获取虚机的块设备bdm信息
            bdms = objects.BlockDeviceMappingList.get_by_instance_uuid(
                    context, instance.uuid)

        block_device_info = \----根据块设备的bdm信息,获取虚机块设备信息
            self._get_instance_block_device_info(
                    context, instance, bdms=bdms)

        def detach_block_devices(context, bdms):-----如果虚机的bdm是卷的话,那么需要进行解挂
            for bdm in bdms:
                if bdm.is_volume:
                    self._detach_volume(context, bdm, instance,
                                        destroy_bdm=False)

        files = self._decode_files(injected_files)----对注入的文件进行解码

        kwargs = dict(
            context=context,
            instance=instance,
            image_meta=image_meta,
            injected_files=files,
            admin_password=new_pass,
            bdms=bdms,
            detach_block_devices=detach_block_devices,
            attach_block_devices=self._prep_block_device,
            block_device_info=block_device_info,
            network_info=network_info,
            preserve_ephemeral=preserve_ephemeral,
            recreate=recreate)
        try:
            with instance.mutated_migration_context():
                self.driver.rebuild(**kwargs)------各个驱动函数都没有实现这个函数,都是执行了下面的
        except NotImplementedError:
            # NOTE(rpodolyaka): driver doesn't provide specialized version
            # of rebuild, fall back to the default implementation
            self._rebuild_default_impl(**kwargs)-----各种类型的驱动都使用了这个函数
        self._update_instance_after_spawn(context, instance)-----更新数据库表的一些信息
        instance.save(expected_task_state=[task_states.REBUILD_SPAWNING])

        if orig_vm_state == vm_states.STOPPED:
            LOG.info("bringing vm to original state: '%s'",
                     orig_vm_state, instance=instance)
            instance.vm_state = vm_states.ACTIVE
            instance.task_state = task_states.POWERING_OFF
            instance.progress = 0
            instance.save()
            self.stop_instance(context, instance, False)
        self._update_scheduler_instance_info(context, instance)----更新相关数据表的信息
        self._notify_about_instance_usage(
                context, instance, "rebuild.end",
                network_info=network_info,
                extra_usage_info=extra_usage_info)
        compute_utils.notify_about_instance_action(
            context, instance, self.host,
            action=fields.NotificationAction.REBUILD,
            phase=fields.NotificationPhase.END)


    def _rebuild_default_impl(self, context, instance, image_meta,
                              injected_files, admin_password, bdms,
                              detach_block_devices, attach_block_devices,
                              network_info=None,
                              recreate=False, block_device_info=None,
                              preserve_ephemeral=False):
        if preserve_ephemeral:
            # The default code path does not support preserving ephemeral
            # partitions.
            raise exception.PreserveEphemeralNotSupported()

        if recreate:------要是宕机疏散的话,那么需要把块设备进行卸载,recreate的话,那么指定是非vm的host的值,所以不用销毁,直接生成新的虚机即可
            detach_block_devices(context, bdms)
        else:
            self._power_off_instance(context, instance, clean_shutdown=True)----在相同的节点上重建的话
            detach_block_devices(context, bdms)----先卸载虚机的块设备,然后把虚机的销毁
            self.driver.destroy(context, instance,
                                network_info=network_info,
                                block_device_info=block_device_info)

        instance.task_state = task_states.REBUILD_BLOCK_DEVICE_MAPPING
        instance.save(expected_task_state=[task_states.REBUILDING])

        new_block_device_info = attach_block_devices(context, instance, bdms)----把虚机的块设备挂载新的主机上

        instance.task_state = task_states.REBUILD_SPAWNING
        instance.save(
            expected_task_state=[task_states.REBUILD_BLOCK_DEVICE_MAPPING])

        with instance.mutated_migration_context():
            self.driver.spawn(context, instance, image_meta, injected_files,-----然后创建新的虚机
                              admin_password, network_info=network_info,
                              block_device_info=new_block_device_info)
  

  

 

 

 

 

 

  

posted @ 2021-06-27 18:46  一切都是当下  阅读(734)  评论(0编辑  收藏  举报