[转]Inside Swift
原文地址:http://www.eswick.com/2014/06/inside-swift/
Inside Swift
Swift is Apple’s new programming language, said by many to ‘replace’ Objective-C. This is not the case. I’ve spent some time reverse engineering Swift binaries and the runtime, and I’ve found out quite a bit about it. So far, the verdict is this; Swift is Objective-C without messages.
Objects
Believe it or not, Swift objects are actually Objective-C objects. In a Mach-O binary, the__objc_classlist
section contains data for each class in the binary. The structure is like so:
struct objc_class {uint64_t isa;uint64_t superclass;uint64_t cache;uint64_t vtable;uint64_t data;};
(note: all structures are from 64-bit builds)
Note the data
entry. It points to a structure listing the methods, ivars, protocols, etc. of the class. Normally, data
is 8-byte-aligned. However, for Swift classes, the last bit of data
will be 1.
Classes
The actual structure for Swift classes is a bit odd. Swift classes have no Objective-C methods. We’ll get to that later. Variables for Swift classes are stored as ivars. The Swift getter and setter methods actually modify the ivar values. Oddly, ivars for Swift classes have no type encoding. The pointer that is normally supposed to point to the type encoding is NULL. This is presumably due to the fact that the Objective-C runtime is not supposed to deal with Swift variables itself.
Inheritance
Inheritance in Swift is as you would expect. In Swift, a Square
that is a subclass of Shape
will also be a subclass of Shape
in the Objective-C class. However, what if a class in Swift doesn’t have a superclass?
e.g.
class Shape { }
In this case, the Shape
class would be a subclass of SwiftObject
. SwiftObject
is a root Objective-C class, similar to NSObject. It has no superclass, meaning the isa points to itself. Its purpose is to use Swift runtime methods for things like allocation and deallocation, instead of the standard Objective-C runtime. For example, - (void)retain
does not call objc_retain
, but instead callsswift_retain
.
Class Methods
Like I mentioned earlier, classes for Swift objects have no methods. Instead, they have been replaced with C++-like functions, mangling and all. This is likely why Swift has been said to be much faster than Objective-C; there is no more need for objc_msgSend
to find and call method implementations.
In Objective-C, method implementations are like so:
type method(id self, SEL _cmd, id arg1, id arg2, ...)
Swift methods are very similar, but with a slightly different argument layout. self
is passed as the last argument, and there is no selector.
type method(id arg1, id arg2, ..., id self)
vtable
Just like in C++, Swift classes have a vtable which lists the methods in the class. It is located directly after the class data in the binary, and looks something like this:
struct swift_vtable_header {uint32_t vtable_size;uint32_t unknown_000;uint32_t unknown_001;uint32_t unknown_002;void* nominalTypeDescriptor;// vtable pointers}
From what I can tell, the vtable for a Swift class is only used when it is visible during compile time. Otherwise, it finds the mangled symbol.
Name Mangling
Swift keeps metadata about functions (and more) in their respective symbols, which is called name mangling. This metadata includes the function’s name (obviously), attributes, module name, argument types, return type, and more. Take this for example:
classShape{ func numberOfSides()->Int{return5}}
The mangled name for the simpleDescription
method is_TFC9swifttest5Shape17simpleDescriptionfS0_FT_Si. Here’s the breakdown:
_T – The prefix for all Swift symbols. Everything will start with this.
F – Function.
C – Function of a class. (method)
9swifttest – The module name, with a prefixed length.
5Shape – The class name the function belongs to, again, with a prefixed length.
17simpleDescription – The function name.
f – The function attribute. In this case it’s ‘f’, which is just a normal function. We’ll get to that in a minute.
S0_FT – I’m not exactly sure what this means, but it appears to mark the start of the arguments and return type.
‘_’ – This underscore separates the argument types from the return type. Since the function takes no arguments, it comes directly after S0_FT.
S – This is the beginning of the return type. The ‘S’ stands for Swift; the return type is a Swift builtin type. The next character determines the type.
i – This is the Swift builtin type. A lowercase ‘I’, which stands for Int.
Function Attributes
Character | Type |
---|---|
f | Normal Function |
s | Setter |
g | Getter |
d | Destructor |
D | Deallocator |
c | Constructor |
C | Allocator |
Swift Builtins
Character | Type |
---|---|
a | Array |
b | Bool |
c | UnicodeScalar |
d | Double |
f | Float |
i | Int |
u | UInt |
Q | ImplicitlyUnwrappedOptional |
S | String |
There’s a lot more to name mangling than just functions, but I’ve just given a brief overview.
Function Hooking
Enough with semantics, let’s get to the fun part! Let’s say we have a class like so:
classShape{var numberOfSides:Int; init(){ numberOfSides =5;}}
Let’s say we want to change the numberOfSides to 4. There are multiple ways to do this. We could use MobileSubstrate to hook into the getter method, and change the return value, like so:
int(*numberOfSides)(id self);MSHook(int, numberOfSides, id self){return4;}%ctor{ numberOfSides =(int(*)(id self)) dlsym(RTLD_DEFAULT,"_TFC9swifttest5Shapeg13numberOfSidesSi");MSHookFunction(numberOfSides,MSHake(numberOfSides));}
If we create an instance of Shape and print out the value of numberOfSides
, we see 4! That wasn’t so bad, was it? Now, I know what you’re thinking; “aren’t you supposed to return an object instead of a 4 literal?”
Well, in Swift, a lot of the builtin types are literals. An Int
, for example, is the same as an int
in C (although it could be a long – don’t hold me to that). A little note, the String
type is a little bit odd; it’s a little-endian UTF-16 string, so no C literals can be used.
Let’s do the same thing, but this time, we’ll hook the setter instead of the getter.
void(*setNumberOfSides)(int newNumber, id self);MSHook(void, setNumberOfSides,int newNumber, id self){ _setNumberOfSides(4,self);}%ctor { setNumberOfSides =(void(*)(int newNumber, id self)) dlsym(RTLD_DEFAULT,"_TFC9swifttest5Shapes13numberOfSidesSi");MSHookFunction(setNumberOfSides,MSHake(setNumberOfSides));}
Try it again and….it’s still 5. What is happening, you ask? Well, in certain places in Swift, functions are inlined. The class constructor is one of these places. It directly sets the numberOfSides
ivar. So, the setter will only be called if the number is set again from the top level code. Call it from there and, what do you know, we get 4.
Finally, let’s change numberOfSides
by directly setting the ivar.
void(*setNumberOfSides)(int newNumber, id self);MSHook(void, setNumberOfSides,int newNumber, id self){MSHookIvar<int>(self,"numberOfSides")=4;}%ctor { setNumberOfSides =(void(*)(int newNumber, id self)) dlsym(RTLD_DEFAULT,"_TFC9swifttest5Shapes13numberOfSidesSi");MSHookFunction(setNumberOfSides,MSHake(setNumberOfSides));}
This works. It’s not recommended, but it works.
That’s all I have to write about for now. There’s quite a few other things that I’m looking at, including witness tables, but I don’t know enough about them to write. A lot of things in this post are subject to change. They’re just what I’ve reverse engineered so far by looking at the runtime and binaries compiled with Swift.
What I’ve found here is very good. It means that MobileSubstrate will not die along with Objective-C, and tweaks can still be made! I wonder what the future has in store for the jailbreaking scene… maybe Logos could be updated to automatically mangle names? Or even a library that deals with common Swift types…
If you find out more about how Swift works, don’t hesitate to let me know!