4.1 MMU设置续
上一节分析到调用 __armv4_mmu_cache_on,执行如下,这里我们要分析 set_mmu 函数
4.1.1 __setup_mmu
前文已经分析过在内核最终运行地址r4下面有16KB的空间(我环境中是0x00004000~0x00008000),这就是用来存放页表的,但是现在要建立的页表在内核真正启动后会被销毁,只是用于零时存放。同时这里将要建立的页表映射关系是1:1映射(即虚拟地址 == 物理地址)。
首先开始执行的 给页表留出空间,将页表的起始地址保存到 R3 中,R4 中保存的内容是 ZRELADDR,然后对齐页表,经过两次 bit 指令, R3 中的值的低 14 位均为0,实际上是对齐到了 16KB 边界。
1 __setup_mmu: sub r3, r4, #16384 @ Page directory size 2 bic r3, r3, #0xff @ Align the pointer 3 bic r3, r3, #0x3f00
常数 #16384 的由来:
32 位的 RAM 系统,寻址空间为 4GB,此处每一个页表项代表 1MB,则需要 4096 个页表项。同时,每一个页表项的大小为 4Byte,那么就需要 4096 * 4Byte = 16384Byte = 16KB 的空间。
此时页表项的每项对应 1MB 的内存空间,其格式为 段(section)页表项 ,如下图所示:

图中 bit4 XN 为不可执行位, bit3 C 为 cacheable,bit2 B 为 bufferable
注: L1页表项的格式有 4 种,分别为 Fault 页表项、Section 页表项、Page Table 页表项和 Supersection 页表项。详细内容参考 ARM 手册
1 /* 2 * Initialise the page tables, turning on the cacheable and bufferable 3 * bits for the RAM area only. 4 */ 5 mov r0, r3 6 mov r9, r0, lsr #18 7 mov r9, r9, lsl #18 @ start of RAM 8 add r10, r9, #0x10000000 @ a reasonable RAM size
注释说的很清楚,初始化页表,打开 cacheable 和 bufferable 位
R0 = R3; R9 保存的实际是 R3 的高 14 位的内容,低 18 位全为0,对齐到了 256MB 的边界; R10 = R9 + 256MB.
比较 R1 和 R9,若 R1 >= R9,则继续比较 R10 和 R1,之后 R1 = 0xC02,用于设置MMU区域表项的低12位状态位
1 mov r1, #0x12 @ XN|U + section mapping 2 orr r1, r1, #3 << 10 @ AP=11 3 add r2, r3, #16384
1 1: cmp r1, r9 @ if virt > start of RAM 2 cmphs r10, r1 @ && end of RAM > virt 3 bic r1, r1, #0x1c @ clear XN|U + C + B 4 orrlo r1, r1, #0x10 @ Set XN|U for non-RAM 5 orrhs r1, r1, r6 @ set RAM section settings 6 str r1, [r0], #4 @ 1:1 mapping 7 add r1, r1, #1048576 8 teq r0, r2 9 bne 1b
(1) r1 > r9 && r1 <r10 (r1的值在物理RAM地址范围内):
设置RAM表项的C+B 位来开启cache和buffer,同时清除XN表示可执行code
(2) r1 < r9 || r1 > r10(r1的值在物理RAM地址范围外):
1 /* 2 * If ever we are running from Flash, then we surely want the cache 3 * to be enabled also for our execution instance... We map 2MB of it 4 * so there is no map overlap problem for up to 1 MB compressed kernel. 5 * If the execution is in RAM then we would only be duplicating the above. 6 */ 7 orr r1, r6, #0x04 @ ensure B is set for this 8 orr r1, r1, #3 << 10 9 mov r2, pc 10 mov r2, r2, lsr #20 11 orr r1, r1, r2, lsl #20 12 add r0, r3, r2, lsl #2 13 str r1, [r0], #4 14 add r1, r1, #1048576 15 str r1, [r0] 16 mov pc, lr 17 ENDPROC(__setup_mmu)
至此,cache on 分析结束,回到主干上继续执行,执行 restart 标签中的语句,继续是在not_angel中
4.2 restart
1 restart: adr r0, LC0 2 ldmia r0, {r1, r2, r3, r6, r10, r11, r12} 3 ldr sp, [r0, #28]
(1) r0:LC0标签处的运行地址
(2) r1:LC0标签处的链接地址
(3) r2:__bss_start处的链接地址
(4) r3:_ednd处的链接地址(即程序结束位置)
(5) r6:_edata处的链接地址(即数据段结束位置)
(6) r10:压缩后内核数据大小位置
(7) r11:GOT表的启示链接地址
(8) r12:GOT表的结束链接地址
(9) sp:栈空间结束地址
1 /* 2 * We might be running at a different address. We need 3 * to fix up various pointers. 4 */ 5 sub r0, r0, r1 @ calculate the delta offset 6 add r6, r6, r0 @ _edata 7 add r10, r10, r0 @ inflated kernel size location
1 /* 2 * The kernel build system appends the size of the 3 * decompressed kernel at the end of the compressed data 4 * in little-endian form. 5 */ 6 ldrb r9, [r10, #0] 7 ldrb lr, [r10, #1] 8 orr r9, r9, lr, lsl #8 9 ldrb lr, [r10, #2] 10 ldrb r10, [r10, #3] 11 orr r9, r9, lr, lsl #16 12 orr r9, r9, r10, lsl #24
1 #ifndef CONFIG_ZBOOT_ROM 2 /* malloc space is above the relocated stack (64k max) */ 3 add sp, sp, r0 4 add r10, sp, #0x10000 5 #else 6 /* 7 * With ZBOOT_ROM the bss/stack is non relocatable, 8 * but someone could still run this code from RAM, 9 * in which case our reference is _edata. 10 */ 11 mov r10, r6 12 #endif
接下来内核如果配置为支持设备树(DTB)会做一些特别的工作,我这里没有配置(#ifdef CONFIG_ARM_APPENDED_DTB),所以先跳过。
1 /* 2 * Check to see if we will overwrite ourselves. 3 * r4 = final kernel address (possibly with LSB set) 4 * r9 = size of decompressed image 5 * r10 = end of this image, including bss/stack/malloc space if non XIP 6 * We basically want: 7 * r4 - 16k page directory >= r10 -> OK 8 * r4 + image length <= address of wont_overwrite -> OK 9 * Note: the possible LSB in r4 is harmless here. 10 */ 11 add r10, r10, #16384 12 cmp r4, r10 13 bhs wont_overwrite 14 add r10, r4, r9 15 adr r9, wont_overwrite 16 cmp r10, r9 17 bls wont_overwrite
这部分代码用来分析当前代码是否会和最后的解压部分重叠,如果有重叠则需要执行代码搬移。首先比较内核解压地址r4-16Kb(这里是0x00004000,包括16KB的内核页表存放位置)和r10,如果r4 – 16kB >= r10,则无需搬移,否则继续计算解压后的内核末尾地址是否在当前运行地址之前,如果是则同样无需搬移,不然的话就需要进行搬移了。
(1) 内核起始地址– 16kB >= 当前镜像结束地址:无需搬移
(2) 内核结束地址 <= wont_overwrite运行地址:无需搬移
(3) 内核起始地址– 16kB < 当前镜像结束地址 && 内核结束地址 > wont_overwrite运行地址:需要搬移
1 /* 2 * Relocate ourselves past the end of the decompressed kernel. 3 * r6 = _edata 4 * r10 = end of the decompressed kernel 5 * Because we always copy ahead, we need to do it from the end and go 6 * backward in case the source and destination overlap. 7 */ 8 /* 9 * Bump to the next 256-byte boundary with the size of 10 * the relocation code added. This avoids overwriting 11 * ourself when the offset is small. 12 */ 13 add r10, r10, #((reloc_code_end - restart + 256) & ~255) 14 bic r10, r10, #255 15 16 /* Get start of code we want to copy and align it down. */ 17 adr r5, restart 18 bic r5, r5, #31
1 sub r9, r6, r5 @ size to copy 2 add r9, r9, #31 @ rounded up to a multiple 3 bic r9, r9, #31 @ ... of 32 bytes 4 add r6, r9, r5 5 add r9, r9, r10 6 7 1: ldmdb r6!, {r0 - r3, r10 - r12, lr} 8 cmp r6, r5 9 stmdb r9!, {r0 - r3, r10 - r12, lr} 10 bhi 1b 11 12 /* Preserve offset to relocated code. */ 13 sub r6, r9, r6 14 15 #ifndef CONFIG_ZBOOT_ROM 16 /* cache_clean_flush may use the stack, so relocate it */ 17 add sp, sp, r6 18 #endif 19 20 bl cache_clean_flush 21 22 badr r0, restart 23 add r0, r0, r6 24 mov pc, r0
接下来开始执行代码搬移,这里是从后往前搬移,一直到r6 == r5结束,然后r6中保存了搬移前后的偏移,并重定向栈指针(cache_clean_flush可能会使用到栈)。
1 wont_overwrite: 2 /* 3 * If delta is zero, we are running at the address we were linked at. 4 * r0 = delta 5 * r2 = BSS start 6 * r3 = BSS end 7 * r4 = kernel execution address (possibly with LSB set) 8 * r5 = appended dtb size (0 if not present) 9 * r7 = architecture ID 10 * r8 = atags pointer 11 * r11 = GOT start 12 * r12 = GOT end 13 * sp = stack pointer 14 */ 15 orrs r1, r0, r5 16 beq not_relocated 17 18 add r11, r11, r0 19 add r12, r12, r0
1 add r11, r11, r0 2 add r12, r12, r0 3 4 #ifndef CONFIG_ZBOOT_ROM 5 /* 6 * If we're running fully PIC === CONFIG_ZBOOT_ROM = n, 7 * we need to fix up pointers into the BSS region. 8 * Note that the stack pointer has already been fixed up. 9 */ 10 add r2, r2, r0 11 add r3, r3, r0 12 13 /* 14 * Relocate all entries in the GOT table. 15 * Bump bss entries to _edata + dtb size 16 */ 17 1: ldr r1, [r11, #0] @ relocate entries in the GOT 18 add r1, r1, r0 @ This fixes up C references 19 cmp r1, r2 @ if entry >= bss_start && 20 cmphs r3, r1 @ bss_end > entry 21 addhi r1, r1, r5 @ entry += dtb size 22 str r1, [r11], #4 @ next entry 23 cmp r11, r12 24 blo 1b 25 26 /* bump our bss pointers too */ 27 add r2, r2, r5 28 add r3, r3, r5
通过r1获取GOT表中的一项,然后对这一项的地址进行修正,如果修正后的地址 < BSS段的起始地址,或者在BSS段之中则再加上DTB的大小(如果不支持DTB则r5的值为0),然后再将值写回GOT表中去。如此循环执行直到遍历完GOT表。
1 not_relocated: mov r0, #0 2 1: str r0, [r2], #4 @ clear bss 3 str r0, [r2], #4 4 str r0, [r2], #4 5 str r0, [r2], #4 6 cmp r2, r3 7 blo 1b
1 /* 2 * Did we skip the cache setup earlier? 3 * That is indicated by the LSB in r4. 4 * Do it now if so. 5 */ 6 tst r4, #1 7 bic r4, r4, #1 8 blne cache_on
1 /* 2 * The C runtime environment should now be setup sufficiently. 3 * Set up some pointers, and start decompressing. 4 * r4 = kernel execution address 5 * r7 = architecture ID 6 * r8 = atags pointer 7 */ 8 mov r0, r4 9 mov r1, sp @ malloc space above stack 10 add r2, sp, #0x10000 @ 64k max 11 mov r3, r7 12 bl decompress_kernel 13 bl cache_clean_flush 14 bl cache_off 15 mov r1, r7 @ restore architecture number 16 mov r2, r8 @ restore atags pointer
跳到 decompress_kernel 跳转到内核C代码运行。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· winform 绘制太阳,地球,月球 运作规律
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)