Unidbg-Linker部分源码分析(上)

Unidbg-Linker部分源码分析(上)

概要

我们有了Android Linker源码部分的解析,还学习了Unicorn的详细使用。如果没有看过之前文章的同学可以看下之前的文章哦。今天我们就来看下Unidbg是如何将一个So加载且跑起来的

在Unidbg中,我们想要加载一个So一般都是通过

DalvikModule dalvikModule = vm.loadLibrary(new File("so_path"), true);

来加载一个So文件,通过这个方法,我们最终会发现Unidbg调用了AndroidElfLoader类的loadInternal方法。AndroidElfLoader类也就充当了Linker的角色,所以我们今天就来分析一下这个类

init

我们先来分析一下AndroidElfLoader的构造方法

 public AndroidElfLoader(Emulator<AndroidFileIO> emulator, UnixSyscallHandler<AndroidFileIO> syscallHandler) {
     // 调用父类构造方法,初始化emulator和syscallHandler字段
    super(emulator, syscallHandler);

    // 初始化SP
    stackSize = STACK_SIZE_OF_PAGE * emulator.getPageAlign();
    
    // 将栈空间mem_map映射,因为在Backend(Unicorn)中,所有需要用的内存都需要先进行映射才能够进行使用,大小就是STACK_SIZE_OF_PAGE * emulator.getPageAlign(),可读可写权限
    backend.mem_map(STACK_BASE - stackSize, stackSize, UnicornConst.UC_PROT_READ | UnicornConst.UC_PROT_WRITE);

    // 设置SP寄存器
    setStackPoint(STACK_BASE);
    // 初始化TLS(线程局部存储相关),在libc一些系统库中是有线程局部变量的,如errno等。这里就做了相关的协处理器的初始化操作
    this.environ = initializeTLS(new String[] {
            "ANDROID_DATA=/data",
            "ANDROID_ROOT=/system"
    });
    this.setErrno(0);
}

上面的注释也写的很清楚了,再总结一下AndroidElfLoader初始化做了什么

  • 在类中保存了emulator和syscallHandler字段,其中emulator就是一个模拟器对象。syscallHandler是linux系统调用相关的处理对象
  • 初始化了栈空间和设置SP寄存器
  • 初始化TLS操作

对TLS感兴趣可以阅读以下源码

http://androidxref.com/4.4.4_r1/xref/bionic/libc/bionic/libc_init_common.cpp#111
http://androidxref.com/4.4.4_r1/xref/bionic/libc/bionic/pthread_internals.cpp#66
http://androidxref.com/4.4.4_r1/xref/bionic/libc/private/bionic_tls.h#90

load

那么当我们调用了loadLibrary方法后,在Android中最终调用了AndroidElfLoader类下的loadInternal方法,我们来分析这个方法,该类的初始化我们已经分析过了

protected final LinuxModule loadInternal(LibraryFile libraryFile, boolean forceCallInit) {
    // File对象被封装为了LibraryFile对象
    try {
        // 接着调用了loadInternal(重载)方法,继续加载流程
        LinuxModule module = loadInternal(libraryFile);
        // 处理符号(关于重定位)
        resolveSymbols(!forceCallInit);
        // callInitFunction默认true
        if (callInitFunction || forceCallInit) {
            // 调用初始化函数
            for (LinuxModule m : modules.values().toArray(new LinuxModule[0])) {
                boolean forceCall = (forceCallInit && m == module) || m.isForceCallInit();
                if (callInitFunction) {
                    m.callInitFunction(emulator, forceCall);
                } else if (forceCall) {
                    m.callInitFunction(emulator, true);
                }
                m.initFunctionList.clear();
            }
        }
        // 添加引用计数
        module.addReferenceCount();
        return module;
    } catch (IOException e) {
        throw new IllegalStateException(e);
    }
}

上面的函数接着调用了重载loadInternal方法继续加载,加载完成会返回一个LinuxModule对象,该对象就保存了该So文件加载后的信息,接着处理了符号和调用初始化函数

我们接着来看重载loadInternal方法,代码比较长

 private LinuxModule loadInternal(LibraryFile libraryFile) throws IOException {
    // 将我们的So文件让ElfFile类去解析,这个ElfFile是jelf库经过凯神改装过的,可以帮助解析Elf文件
    final ElfFile elfFile = ElfFile.fromBytes(libraryFile.mapBuffer());

    // ... 判断文件是否合法,32位模拟器是否加载了64位So等待

    long start = System.currentTimeMillis();

    // 获取当前So的最大虚拟地址和页对齐align参数
    long bound_high = 0;
    long align = 0;
    for (int i = 0; i < elfFile.num_ph; i++) {
        ElfSegment ph = elfFile.getProgramHeader(i);
        if (ph.type == ElfSegment.PT_LOAD && ph.mem_size > 0) {
            // 遍历所有mem_size>0的PT_LOAD段
            long high = ph.virtual_address + ph.mem_size;
            // 寻找bound_high最大值
            if (bound_high < high) {
                bound_high = high;
            }
            // 寻找alignment最大值
            if (ph.alignment > align) {
                align = ph.alignment;
            }
        }
    }

    ElfDynamicStructure dynamicStructure = null;

    // 从获取到的So指定的alignment和默认的PageAlign取一个最大值,一般拿到的就是4K大小
    final long baseAlign = Math.max(emulator.getPageAlign(), align);

    // 根据baseAlign来计算该So的加载地址。初始地址0x40000000L
    final long load_base = ((mmapBaseAddress - 1) / baseAlign + 1) * baseAlign;

    // 这个就相当于Linker在计算load_size,但Unidbg中将所有So的最小虚拟地址默认为0
    // 这里有改进空间对吧,因为Linker中为了防止内存浪费,出现了一个load_bias_字段
    // 但是出于目的不用,Unidbg的目的是让二进制文件跑起来
    long size = ARM.align(0, bound_high, baseAlign).size;

    // 设置加载下个So的mmapBaseAddress
    setMMapBaseAddress(load_base + size);

    // MemRegion存储了哪块内存对应了哪个So文件
    final List<MemRegion> regions = new ArrayList<>(5);
    MemoizedObject<ArmExIdx> armExIdx = null;
    MemoizedObject<GnuEhFrameHeader> ehFrameHeader = null;
    Alignment lastAlignment = null;

    // 再次遍历所有段
    for (int i = 0; i < elfFile.num_ph; i++) {
        ElfSegment ph = elfFile.getProgramHeader(i);
        switch (ph.type) {
            case ElfSegment.PT_LOAD:
                // PT_LOAD段
                // 获取该段在内存中对应的操作权限,如该段未指定,设置满权限(一般不会出现这种情况)
                int prot = get_segment_protection(ph.flags);
                if (prot == UnicornConst.UC_PROT_NONE) {
                    prot = UnicornConst.UC_PROT_ALL;
                }
                // 该段在内存中的起始地址
                final long begin = load_base + ph.virtual_address;

                // 计算该段在内存中的位置和大小
                Alignment check = ARM.align(begin, ph.mem_size, Math.max(emulator.getPageAlign(), ph.alignment));

                // 获取上一个内存快
                final int regionSize = regions.size();
                MemRegion last = regionSize <= 0 ? null : regions.get(regionSize - 1);

                // 处理两个段之间重叠部分?未找到案例
                MemRegion overall = null;
                if (last != null && check.address >= last.begin && check.address < last.end) {
                    overall = last;
                }

                if (overall != null) {
                    // 处理重叠段,应该为特殊情况,正常都会走下面else分支
                    long overallSize = overall.end - check.address;
                    backend.mem_protect(check.address, overallSize, overall.perms | prot);
                    if (ph.mem_size > overallSize) {
                        Alignment alignment = this.mem_map(begin + overallSize, ph.mem_size - overallSize, prot, libraryFile.getName(), Math.max(emulator.getPageAlign(), ph.alignment));
                        regions.add(new MemRegion(alignment.address, alignment.address + alignment.size, prot, libraryFile, ph.virtual_address));
                        if (lastAlignment != null) {
                            throw new UnsupportedOperationException();
                        }
                        lastAlignment = alignment;
                    }
                } else {
                    // 将该PT_LOAD段指示的内存大小进行映射
                    Alignment alignment = this.mem_map(begin, ph.mem_size, prot, libraryFile.getName(), Math.max(emulator.getPageAlign(), ph.alignment));

                    // 添加一块MemRegion
                    regions.add(new MemRegion(alignment.address, alignment.address + alignment.size, prot, libraryFile, ph.virtual_address));
                    if (lastAlignment != null) {
                        // 处理该段与上一个段之间的空隙,置0
                        long base = lastAlignment.address + lastAlignment.size;
                        long off = alignment.address - base;
                        if (off < 0) {
                            throw new IllegalStateException();
                        }
                        if (off > 0) {
                            backend.mem_map(base, off, UnicornConst.UC_PROT_NONE);
                            if (memoryMap.put(base, new MemoryMap(base, (int) off, UnicornConst.UC_PROT_NONE)) != null) {
                                log.warn("mem_map replace exists memory map base=" + Long.toHexString(base));
                            }
                        }
                    }
                    lastAlignment = alignment;
                }

                // 将该段对应的数据写入进已经映射好的内存
                ph.getPtLoadData().writeTo(pointer(begin));
                break;
            case ElfSegment.PT_DYNAMIC:
                // DYNAMIC段
                dynamicStructure = ph.getDynamicStructure();
                break;
            case ElfSegment.PT_INTERP:
                // INTERP段指定了解释器位置,在So中没用
                if (log.isDebugEnabled()) {
                    log.debug("[" + libraryFile.getName() + "]interp=" + ph.getInterpreter());
                }
                break;
            case ElfSegment.PT_GNU_EH_FRAME:
                // 没分析过,未知TODO
                ehFrameHeader = ph.getEhFrameHeader();
                break;
            case ElfSegment.PT_ARM_EXIDX:
                // 异常相关的段
                armExIdx = ph.getARMExIdxData();
                break;
            default:
                if (log.isDebugEnabled()) {
                    log.debug("[" + libraryFile.getName() + "]segment type=0x" + Integer.toHexString(ph.type) + ", offset=0x" + Long.toHexString(ph.offset));
                }
                break;
        }
    }
    

    // 此时,该So中的有用的段信息已经处理完毕
    // 该加载到内存的已经加载到内存
    // 该置空的内存也已置空
    // 动态段、异常段、PT_GNU_EH_FRAME段的信息已经保存下来,继续看接下来的处理

    // 动态段是必须有的
    if (dynamicStructure == null) {
        throw new IllegalStateException("dynamicStructure is empty.");
    }
    // 此SoName是动态段中的tag为SO_NAME指定的内容,而且Unidbg中的Log也是基于这个SoName打印的
    // 如果该内容为空,才会使用文件名。这也就是有的同学会问为什么我加载的是libxxxxx.so,而日志输出libyyyyyy.so呢
    final String soName = dynamicStructure.getSOName(libraryFile.getName());

    // 下面处理So中的依赖库
    Map<String, Module> neededLibraries = new HashMap<>();

    // dynamicStructure.getNeededLibraries(),这个方法是Unidbg改写jelf库加上的方法,会获取到所有的依赖库的名字
    for (String neededLibrary : dynamicStructure.getNeededLibraries()) {
        if (log.isDebugEnabled()) {
            log.debug(soName + " need dependency " + neededLibrary);
        }

        // modules字段保存了所有已经加载过的库,这里就是在寻找是否该So已经被加载过
        LinuxModule loaded = modules.get(neededLibrary);
        if (loaded != null) {
            // 如果加载过了,添加引用计数、放到neededLibraries变量
            loaded.addReferenceCount();
            neededLibraries.put(FilenameUtils.getBaseName(loaded.name), loaded);
            continue;
        }
        // 如果依赖还没有被加载过,就开始寻找这个依赖文件在哪,先在当前So的路径下找
        LibraryFile neededLibraryFile = libraryFile.resolveLibrary(emulator, neededLibrary);

        // 如果当前路径下没有找到,就去找library解析器去找
        if (libraryResolver != null && neededLibraryFile == null) {
            neededLibraryFile = libraryResolver.resolveLibrary(emulator, neededLibrary);
        }

        if (neededLibraryFile != null) {
            // 大吉大利,So找到啦,就会在这里加载
            LinuxModule needed = loadInternal(neededLibraryFile);
            needed.addReferenceCount();
            neededLibraries.put(FilenameUtils.getBaseName(needed.name), needed);
        } else {
            log.info(soName + " load dependency " + neededLibrary + " failed");
        }
    }

    // 到这里,该So所依赖的So也被加载进来了

    // 下面这个循环会处理未解决(符号为0特殊情况)的重定位,进行二次重定位,极少数能成功,如果确定没用可以注释掉
    for (LinuxModule module : modules.values()) {
        for (Iterator<ModuleSymbol> iterator = module.getUnresolvedSymbol().iterator(); iterator.hasNext(); ) {
            ModuleSymbol moduleSymbol = iterator.next();
            ModuleSymbol resolved = moduleSymbol.resolve(module.getNeededLibraries(), false, hookListeners, emulator.getSvcMemory());
            if (resolved != null) {
                if (log.isDebugEnabled()) {
                    log.debug("[" + moduleSymbol.soName + "]" + moduleSymbol.symbol.getName() + " symbol resolved to " + resolved.toSoName);
                }
                resolved.relocation(emulator);
                iterator.remove();
            }
        }
    }


    // 下面开始处理重定位
    List<ModuleSymbol> list = new ArrayList<>();
    for (MemoizedObject<ElfRelocation> object : dynamicStructure.getRelocations()) {
        // 遍历So中所有的重定位信息
        ElfRelocation relocation = object.getValue();

        // 拿到重定位类型
        final int type = relocation.type();
        if (type == 0) {
            log.warn("Unhandled relocation type " + type);
            continue;
        }

        // 拿到重定位项指定的符号信息
        ElfSymbol symbol = relocation.sym() == 0 ? null : relocation.symbol();
        long sym_value = symbol != null ? symbol.value : 0;

        // 计算需要重定位的位置
        Pointer relocationAddr = UnidbgPointer.pointer(emulator, load_base + relocation.offset());
        assert relocationAddr != null;

        Log log = LogFactory.getLog("com.github.unidbg.linux." + soName);
        if (log.isDebugEnabled()) {
            log.debug("symbol=" + symbol + ", type=" + type + ", relocationAddr=" + relocationAddr + ", offset=0x" + Long.toHexString(relocation.offset()) + ", addend=" + relocation.addend() + ", sym=" + relocation.sym() + ", android=" + relocation.isAndroid());
        }

        ModuleSymbol moduleSymbol;
        // 根据重定位类型进行不同的处理,下面包含了32位/64位下的重定位处理
        switch (type) {
            case ARMEmulator.R_ARM_ABS32: {
                int offset = relocationAddr.getInt(0);
                moduleSymbol = resolveSymbol(load_base, symbol, relocationAddr, soName, neededLibraries.values(), offset);
                if (moduleSymbol == null) {
                    // 不能当即处理的,添加到list,后面再处理
                    list.add(new ModuleSymbol(soName, load_base, symbol, relocationAddr, null, offset));
                } else {
                    moduleSymbol.relocation(emulator);
                }
                break;
            }
            case ARMEmulator.R_AARCH64_ABS64: {
                long offset = relocationAddr.getLong(0) + relocation.addend();
                moduleSymbol = resolveSymbol(load_base, symbol, relocationAddr, soName, neededLibraries.values(), offset);
                if (moduleSymbol == null) {
                    list.add(new ModuleSymbol(soName, load_base, symbol, relocationAddr, null, offset));
                } else {
                    moduleSymbol.relocation(emulator);
                }
                break;
            }
            case ARMEmulator.R_ARM_RELATIVE: {
                int offset = relocationAddr.getInt(0);
                if (sym_value == 0) {
                    relocationAddr.setInt(0, (int) load_base + offset);
                } else {
                    throw new IllegalStateException("sym_value=0x" + Long.toHexString(sym_value));
                }
                break;
            }
            case ARMEmulator.R_AARCH64_RELATIVE:
                if (sym_value == 0) {
                    relocationAddr.setLong(0, load_base + relocation.addend());
                } else {
                    throw new IllegalStateException("sym_value=0x" + Long.toHexString(sym_value));
                }
                break;
            case ARMEmulator.R_ARM_GLOB_DAT:
            case ARMEmulator.R_ARM_JUMP_SLOT:
                moduleSymbol = resolveSymbol(load_base, symbol, relocationAddr, soName, neededLibraries.values(), 0);
                if (moduleSymbol == null) {
                    list.add(new ModuleSymbol(soName, load_base, symbol, relocationAddr, null, 0));
                } else {
                    moduleSymbol.relocation(emulator);
                }
                break;
            case ARMEmulator.R_AARCH64_GLOB_DAT:
            case ARMEmulator.R_AARCH64_JUMP_SLOT:
                moduleSymbol = resolveSymbol(load_base, symbol, relocationAddr, soName, neededLibraries.values(), relocation.addend());
                if (moduleSymbol == null) {
                    list.add(new ModuleSymbol(soName, load_base, symbol, relocationAddr, null, relocation.addend()));
                } else {
                    moduleSymbol.relocation(emulator);
                }
                break;
            case ARMEmulator.R_ARM_COPY:
                throw new IllegalStateException("R_ARM_COPY relocations are not supported");
            case ARMEmulator.R_AARCH64_COPY:
                throw new IllegalStateException("R_AARCH64_COPY relocations are not supported");
            case ARMEmulator.R_AARCH64_ABS32:
            case ARMEmulator.R_AARCH64_ABS16:
            case ARMEmulator.R_AARCH64_PREL64:
            case ARMEmulator.R_AARCH64_PREL32:
            case ARMEmulator.R_AARCH64_PREL16:
            case ARMEmulator.R_AARCH64_IRELATIVE:
            case ARMEmulator.R_AARCH64_TLS_TPREL64:
            case ARMEmulator.R_AARCH64_TLS_DTPREL32:
            case ARMEmulator.R_ARM_IRELATIVE:
            case ARMEmulator.R_ARM_REL32:
            default:
                log.warn("[" + soName + "]Unhandled relocation type " + type + ", symbol=" + symbol + ", relocationAddr=" + relocationAddr + ", offset=0x" + Long.toHexString(relocation.offset()) + ", addend=" + relocation.addend() + ", android=" + relocation.isAndroid());
                break;
        }
    }

    // 重定位完成后,开始执行初始化函数
    List<InitFunction> initFunctionList = new ArrayList<>();
    if (elfFile.file_type == ElfFile.FT_EXEC) {
        // 处理可执行文件相关,我们分析So的,忽略就可以
        int preInitArraySize = dynamicStructure.getPreInitArraySize();
        int count = preInitArraySize / emulator.getPointerSize();
        if (count > 0) {
            Pointer pointer = UnidbgPointer.pointer(emulator, load_base + dynamicStructure.getPreInitArrayOffset());
            if (pointer == null) {
                throw new IllegalStateException("DT_PREINIT_ARRAY is null");
            }
            for (int i = 0; i < count; i++) {
                Pointer func = pointer.getPointer((long) i * emulator.getPointerSize());
                if (func != null) {
                    initFunctionList.add(new AbsoluteInitFunction(load_base, soName, ((UnidbgPointer) func).peer));
                }
            }
        }
    }
    if (elfFile.file_type == ElfFile.FT_DYN) {
        // 处理So的初始化函数
        //下面的处理内容在新版有修复,我们之前Linker的文章也讲过,他们的顺序不应该是平级的,需要Init函数先执行
        int init = dynamicStructure.getInit();
        if (init != 0) {
            initFunctionList.add(new LinuxInitFunction(load_base, soName, init));
        }

        // 处理 init.array
        int initArraySize = dynamicStructure.getInitArraySize();
        int count = initArraySize / emulator.getPointerSize();
        if (count > 0) {
            Pointer pointer = UnidbgPointer.pointer(emulator, load_base + dynamicStructure.getInitArrayOffset());
            if (pointer == null) {
                throw new IllegalStateException("DT_INIT_ARRAY is null");
            }
            for (int i = 0; i < count; i++) {
                // 当作数组来处理每一个init函数
                Pointer func = pointer.getPointer((long) i * emulator.getPointerSize());
                if (func != null) {
                    // 将他们添加到initFunction列表中
                    initFunctionList.add(new AbsoluteInitFunction(load_base, soName, ((UnidbgPointer) func).peer));
                }
            }
        }
    }

    // 至此,依赖So加载了,重定位可以处理的也处理了(不能处理的还会有二次处理)
    // 初始化函数也被添加到列表中了,但是还没有调用(注意)

    SymbolLocator dynsym = dynamicStructure.getSymbolStructure();
    if (dynsym == null) {
        throw new IllegalStateException("dynsym is null");
    }
    ElfSection symbolTableSection = null;
    try {
        symbolTableSection = elfFile.getSymbolTableSection();
    } catch(Throwable ignored) {}

    // 将加载好的So封装位LinuxModule对象
    LinuxModule module = new LinuxModule(load_base, size, soName, dynsym, list, initFunctionList, neededLibraries, regions,
            armExIdx, ehFrameHeader, symbolTableSection, elfFile, dynamicStructure);
    // ...

    // 放入已加载的So列表中
    modules.put(soName, module);
    if (maxSoName == null || soName.length() > maxSoName.length()) {
        maxSoName = soName;
    }
    if (bound_high > maxSizeOfSo) {
        maxSizeOfSo = bound_high;
    }
    // 设置可执行Elf的入口点
    module.setEntryPoint(elfFile.entry_point);
    log.debug("Load library " + soName + " offset=" + (System.currentTimeMillis() - start) + "ms" + ", entry_point=0x" + Long.toHexString(elfFile.entry_point));
    // 通知监听器,So已加载完毕
    notifyModuleLoaded(module);
    return module;
}

总结

我们分为上下两篇文章。我们看到上面这个loadInternal方法就是加载一个So主要的内容,大致内容看到这里其实已经够用了,如果想了解更多的细节就看下篇。其中的很多方法我们没有展开来讲解,接下来我们就来处理剩余的细枝末节。如果文章有错误的地方还请指正,可以加个VX一起交流:roy5ue或直接在文章下方进行评论哦

posted on 2021-10-29 18:45  r0ysue  阅读(582)  评论(0编辑  收藏  举报