READMSR和CPUID指令在Guest中的代码执行路径学习

READMSR和CPUID指令在Guest中的代码执行路径学习

内核版本:5.3.0

qemu版本:4.2.0

READMSR指令

作用

读MSR,MSR由ECX(RCX)的内容指定,读出的内容保存在EDX(RDX):EAX(RAX)中.

VMX相关

如果guest中执行rdmsr指令,并且以下情况之一成立,就会触发vmexit.

  1. "use MSR bitmaps" control为0
  2. RCX既不在0x00000000H-0x00001FFFH中,也不在0xC0000000H-0xC0001FFFH中
  3. RCX在0x00000000H-0x00001FFFH中,但是给Low MSRsread bitmap的第RCX个bit为1.
  4. RCX在0xC0000000H-0xC0001FFFH中,但是给HIGH MSRsread bitmap的第n个bit为为1,n=RCX & 0x00001FFFH

MSR bitmap address指向MSR bitmaps(4K),每1K对应low/high MSRs(read/write).且MSR bitmap address是VMCS的一部分,访问该address只需要正常的memory access即可.

代码分析(MSR bitmap)

kvm代码

  • VMCS中MSR bitmap的初始化

qemu=> kvm_vm_ioctl(KVM_CREATE_VCPU) => kvm_vm_ioctl_create_vcpu() => kvm_arch_vcpu_create() => vmx_create_vcpu() => vmx_vcpu_setup()

static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
{
    // 如果cpu支持use MSR bitmaps,就将分配好的msr bitmap的地址写入VMCS中的MSR bitmap address域中
    if (cpu_has_vmx_msr_bitmap())
		vmcs_write64(MSR_BITMAP, __pa(vmx->vmcs01.msr_bitmap));
}

在qemu请求创建VCPU时,就会将MSR bitmap的地址写入VMCS中.

  • MSR bitmap的空间分配

qemu=> kvm_vm_ioctl(KVM_CREATE_VCPU) => kvm_vm_ioctl_create_vcpu() => kvm_arch_vcpu_create() => vmx_create_vcpu() => alloc_loaded_vmcs()

static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
{
    	err = alloc_loaded_vmcs(&vmx->vmcs01);
}

// 分配一个page(4K)的空间给msr bitmap,并将该空间的内容初始化为全1
int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs)
{
    	if (cpu_has_vmx_msr_bitmap()) {
		loaded_vmcs->msr_bitmap = (unsigned long *)
				__get_free_page(GFP_KERNEL_ACCOUNT);
		if (!loaded_vmcs->msr_bitmap)
			goto out_vmcs;
		memset(loaded_vmcs->msr_bitmap, 0xff, PAGE_SIZE);
}

在qemu请求创建VCPU时,为MSR bitmap分配4K空间,初始化为全1

  • 对MSR bitmap中的特定bit(对应特定MSR)进行初始化操作

qemu=> kvm_vm_ioctl(KVM_CREATE_VCPU) => kvm_vm_ioctl_create_vcpu() => kvm_arch_vcpu_create() => vmx_create_vcpu() => vmx_vcpu_setup()

static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
{
    msr_bitmap = vmx->vmcs01.msr_bitmap;
    // 清除MSR bitmap中的特定bit, 之后访问这些MSR都不需要exit
	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_TSC, MSR_TYPE_R);
	vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW);
	vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW);
	vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
	if (kvm_cstate_in_guest(kvm)) {
		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C1_RES, MSR_TYPE_R);
		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
	}
	vmx->msr_bitmap_mode = 0;

	vmx->loaded_vmcs = &vmx->vmcs01;
}

之后在运行过程中,还会更新一些APIC相关的中断MSR设置,其余MSR如没有特别设置,访问MSR均需要vmexit.

qemu代码

qemu提供以下代码获得MSR bitmap信息, 也可以对该信息进行修改,但qemu实际运行过程中没有修改MSR bitmap.

MSR_IA32_VMX_BASIC_REGISTER  Msr;

Msr.Uint64 = AsmReadMsr64 (MSR_IA32_VMX_BASIC);

代码分析(read_msr)

guest中,在读MSR bitmap中对应bit为1的MSR时, 会导致vmexit.

guest中的read_msr会出现以下执行函数链:

guest读MSR => handle_rdmsr() => vmx_get_msr() => kvm_get_msr_common()

其中,vmx_get_msr()中处理一部分读特殊MSR请求,kvm_get_msr_common()中处理普通读MSR请求.

MSR_IA32_ARCH_CAPABILITIES为例:

由于MSR_IA32_ARCH_CAPABILITIES是一个普通的MSR,所以交给kvm_get_msr_common()函数处理.

int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
    case MSR_IA32_ARCH_CAPABILITIES:
		if (!msr_info->host_initiated &&
		    !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
			return 1;
		msr_info->data = vcpu->arch.arch_capabilities;
		break;
}

代码中的msr_info->host_initiated用于区分此次读MSR内容的动作是由qemu发起的,还是由guest自己发起的.如果是qemu发起的,msr_info->host_initiated就为true,如果是guest自己发起的,msr_info->host_initiated就为false.很明显,guest读MSR_IA32_ARCH_CAPABILITIES时,msr_info->host_initiated应该为false.

guest_cpuid_has()用于检验guest是否有CPUID feature: X86_FEATURE_ARCH_CAPABILITIES, 关于CPUID在后面的一节分析,这里只需要知道,guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES)检查guest是否有该CPUID feature, 有则为true,无为false.

如果guest有该feature, 则将vcpu->arch.arch_capabilities中的内容填充到msr_info->data中去,完成读MSR工作.

如果guest没有该feature,则返回1,表明读取MSR失败.(一般guest在读msr之前,会现将读取结果初始化为0,如果读取失败,那么读取结果仍旧为0,这种设计能够防止读msr失败后程序无法继续执行)

假设guest有该feature(如果没有的话,代码分析也就到此结束了), 读取到的内容为arch_capabilities的内容.

这个vcpu->arch.arch_capabilities在内核中的2个地方有被赋值操作:

  1. kvm_arch_vcpu_setup()中,即在初始化vcpu时被赋值:

qemu=> kvm_vm_ioctl(KVM_CREATE_VCPU) => kvm_vm_ioctl_create_vcpu() => kvm_vm_ioctl_create_vcpu() => kvm_arch_vcpu_setup()

int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
{
    ...
        vcpu->arch.arch_capabilities = kvm_get_arch_capabilities();
    ...
}

// arch/x86/kvm/x86.c
static u64 kvm_get_arch_capabilities(void)
{
    if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, data);
    ...
    return data;
}

再次假设boot cpu有feature X86_FEATURE_ARCH_CAPABILITIES(事实是现在很多CPU都有该feature),也就是一个物理CPU,有这个X86_FEATURE_ARCH_CAPABILITIES标志,那么还是通过rdmsr读取MSR_IA32_ARCH_CAPABILITIES的数据到data.不过这次rdmsr不会vmexit,而是在host内核空间中获取PCPU的X86_FEATURE_ARCH_CAPABILITIES.

也就是说,在qemu发起创建VCPU请求时,会将 vcpu->arch.arch_capabilities设置为PCPU(物理CPU)的对应MSR读到的内容.

  1. 在kvm_set_msr_common()中对arch_capabilities做了赋值,这是qemu在通过vcpu_ioctl时设置了arch_capabilities的值.

[ kvm_arch_put_registers(cpu, KVM_PUT_RESET_STATE),

kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE),

kvm_arch_put_registers(cpu, KVM_PUT_RUNTIME_STATE)] => kvm_arch_put_registers() => kvm_put_msrs() =>

// qemu代码: 设置MSR_entry, 并将这些MSR内容写入到guest中
static int kvm_put_msrs(X86CPU *cpu, int level)
{
    /*在kvm_put_msrs()的开头, 为大量MSR添加entry,保存在cpu->kvm_msr_buf中*/
    ...
     /* If host supports feature MSR, write down. */
    if (has_msr_arch_capabs) {
        kvm_msr_entry_add(cpu, MSR_IA32_ARCH_CAPABILITIES,
                          env->features[FEAT_ARCH_CAPABILITIES]);
    }
    ...
    ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, cpu->kvm_msr_buf); // 将MSR信息写入guest中
}

在vcpu运行期间,vcpu复位时,初始化vcpu时,都会调用kvm_put_msrs()设置vcpu支持的MSR和对应的内容.最终通过kvm_vcpu_ioctl(KVM_SET_MSRS)写入guest中.

has_msr_arch_capabs flag 在qemu通过thread创建vcpu时,就通过kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, &msr_list)获得guest的msrlist,然后检查guest中是否存在arch_capability feature, 来设置的. kvm在收到KVM_GET_MSR_INDEX_LIST请求后,返回guest支持的MSR和kvm可以模拟的MSR列表.

// kvm代码: 收到KVM_SET_MSRS的ioctl请求后,调用do_set_msr
long kvm_arch_vcpu_ioctl(struct file *filp,
			 unsigned int ioctl, unsigned long arg)
{
    case KVM_SET_MSRS: {
		int idx = srcu_read_lock(&vcpu->kvm->srcu);
		r = msr_io(vcpu, argp, do_set_msr, 0);
		srcu_read_unlock(&vcpu->kvm->srcu, idx);
		break;
	}
}

// KVM_SET_MSRS的最终实现代码,以MSR_IA32_ARCH_CAPABILITIES为例
kvm_set_msr_common()
{
    case MSR_IA32_ARCH_CAPABILITIES:
		if (!msr_info->host_initiated) // 如果是guest自己填充这个MSR,就返回1,表示设置该MSR失败
			return 1;
		vcpu->arch.arch_capabilities = data;
		break;
}

kvm收到KVM_SET_MSRS的ioctl请求后,调用do_set_msr

do_set_msr() => kvm_set_msr => kvm_x86_ops->set_msr => vmx_set_msr => kvm_set_msr_common

最终由kvm_set_msr_common()完成对arch_capabilities的赋值.这里的data,首先由qemu从kvm中获取,然后又由qemu向kvm写入,所以归根结底,还是来自于kvm,即host.

CPUID指令

向EAX,ECX写入需要查询的内容,执行CPUID,查询结果会出现在EAX,EBX,ECX,EDX中.

代码分析

guest执行CPUID肯定会导致VMEXIT.然后由kvm处理CPUID.

handle_cpuid() => kvm_emulate_cpuid()

int kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
{
	u32 eax, ebx, ecx, edx;

	if (cpuid_fault_enabled(vcpu) && !kvm_require_cpl(vcpu, 0))
		return 1;

	eax = kvm_rax_read(vcpu); // 读取vcpu的rax内容
	ecx = kvm_rcx_read(vcpu); // 读取vcpu的rcx内容
	kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx, true);
	kvm_rax_write(vcpu, eax);
	kvm_rbx_write(vcpu, ebx);
	kvm_rcx_write(vcpu, ecx);
	kvm_rdx_write(vcpu, edx);
	return kvm_skip_emulated_instruction(vcpu);
}


bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
	       u32 *ecx, u32 *edx, bool check_limit)
{
	u32 function = *eax, index = *ecx;
	struct kvm_cpuid_entry2 *best;
	bool entry_found = true;

	best = kvm_find_cpuid_entry(vcpu, function, index);

	if (!best) {
		entry_found = false;
		if (!check_limit)
			goto out;

		best = check_cpuid_limit(vcpu, function, index);
	}

out:
	if (best) {
		*eax = best->eax;
		*ebx = best->ebx;
		*ecx = best->ecx;
		*edx = best->edx;
	} else
		*eax = *ebx = *ecx = *edx = 0;
	trace_kvm_cpuid(function, *eax, *ebx, *ecx, *edx, entry_found);
	return entry_found;
}

比较重要的函数为kvm_find_cpuid_entry,该函数寻找Qemu写入到kvm中的CPUID_entry,如果存在,就返回CPUID的结果,如果不存在,并且check_limit为1,就确定EAX传入的数据是否超过了该vcpu的最大可接受参数,如果超过了,就返回vcpu所支持的最大EAX的值的CPUID值.

所以比较重要的是这个"entry",该entry由Qemu写入.

大致过程为:

  1. qemu通过ioctl(KVM_GET_SUPPORTED_CPUID)读取到host支持的CPUID列表
  2. qemu通过与运算剔除掉qemu不支持的CPUID
  3. 最后通过ioctl(KVM_SET_CPUID2)将CPUID数据写入到KVM中供guest使用
posted @ 2021-02-24 13:42  EwanHai  阅读(1497)  评论(0编辑  收藏  举报