[转] A few things iOS developers ought to know about the ARM architecture
When I wrote my Introduction to NEON on iPhone, I considered some knowledge about the iOS devices’ processors as assumed to be known by the reader. However, from some discussions online I have realized some of this knowledge was not universal; my bad. Furthermore these are things I think are useful for iPhone programming in general (not just if you’re interested in NEON), even if you program in high-level Objective-C. You could live without them, but knowing them will make you a better iPhone programmer.
The Basics
All iOS devices released so far are powered by processors based on the ARM architecture; as you’ll see, this architecture is a bit unlike what you may be used to on the desktop with x86 or even PowerPC. However, it is not a “special” or “niche” architecture: nearly all mobile phones (not just smartphones) are based on ARM; practically all iPods were based on ARM, as well as nearly all mp3 players; PDAs and Pocket PCs were generally ARM-based; Nintendo portable consoles are based on ARM since the GBA; it is now invading graphic calculators with some TI and HP models using it; and if you want pedigree, know the Newton was based on ARM as well (in fact, Apple was an early investor in ARM). And that’s only mentioning gadgets; countless ARM processors have shipped in unassuming embedded roles.
ARM processors are renowned for their small size on the silicon die, low power usage, and of course their performance (within their power class). The ARM architecture (at least as used in the iOS platform) is little-endian, just like x86. It is a 32-bit RISC architecture, like MIPS, PowerPC, etc. Notice the simulator does not execute ARM code, when building for the simulator your app is compiled for x86 and executes natively, so none of the following applies when running on the simulator, you need to be running on target.
ARMv7, ARM11, Cortex A8 and A4, oh my!
The ARM architecture comes in a few different versions developed over time; each one added some new instructions and other improvements, while being backwards compatible with the previous versions. The first iPhone had a processor that implements ARMv6 (short for ARM version 6), while the latest devices have processors that can support ARMv7. So when you compile code, you specify the architecture version you’re targeting, and the compiler will restrict the instructions it generates to those available in that architecture version; the same goes for the assembler, which will check that the instructions used in the code are present in the specified architecture version. In the end, you have object code that targets a specific architecture variant, ARMv6 or ARMv7 (or ARMv5 or ARMv4, but given that ARMv6 is the baseline in iOS development, you’re very unlikely to target these); the object and executable files are in fact marked with the architecture they target, run otool -vh foo.o
on one of your object or executable files sometime.
However, it does not make sense to say that the original iPhone had “the ARMv6 processor”: ARMv6 does not designate a particular processor, but the set of instructions a processor can run, which does not imply any particular implementation. The processor core implementation used in the original iPhone was the ARM11 (it was the ARM1176JZF-S, to be really accurate, but it matters very little, just remember it was a member of the ARM11 family); as mentioned earlier, this processor implements ARMv6. Subsequent devices used ARM11 as well, up until the iPhone 3GS which started using the Cortex A8 processor core, used in all iOS devices released since then at the time of this writing (this is not yet certain, but strongly suspected, in the case of the iPhone 4). This core implements the ARMv7 instruction set, or in short, supports ARMv7.
Now having said that, DO NOT go around and write code that detects which device your code is executing on and tries to figure out which architecture it supports using the known information on currently released devices. Besides being the most unreliable code you could write, this kind of code will break when run on a device released after your application. So please don’t do it, otherwise I swear I will come to your house and maim you. This information is just so that you have a rough idea of the installed base of devices that can support ARMv7 and the ones that can only run ARMv6; I’ll get to detection in a minute.
But you may be wondering: “I thought the iPad and iPhone 4 had an A4 processor, not a Cortex A8?!” The A4 is in fact the whole applications System on a Chip, which includes not only a Cortex A8 core, but also graphics hardware, as well as video and audio codec accelerators and other digital blocks. The SoC and the processor core on it are very different things; the processor core does not even take the majority of the space on the silicon die.
ARMv7 support on the latest devices would be pretty useless if you couldn’t take advantage of it, so you can do so, but always doing so would prevent your code from running on earlier devices, which may not be what you want. So how do you detect which architecture version a device supports so that you can take advantage of ARMv7 features if and only if they are present? The thing is, you don’t. Instead, your code is compiled twice, once targeting ARMv6, and once targeting ARMv7; the two executables are then put together in a fat binary, and at runtime the device will itself choose the best half it can support. Yes, Mach-O fat binaries are not just for grouping completely different CPU architectures (e.g. PowerPC and Intel, making a Universal Binary), or 32 and 64 bit versions of an architecture, but also two variants (cpu subtypes, in Mach-O parlance) of the same architecture. The outcome is that from the viewpoint of the programmer, everything gets decided at compile time: the code compiled targeting ARMv6 will only ever run on ARMv6 devices, and the code compiled targeting ARMv7 will only ever run on ARMv7 (or better).
If you’ve read my NEON post, you may remember that in that post I also suggested a way to do the detection and selection at runtime. If you check now, you’ll notice I have actually removed that part, and now recommend you do not use that method any more.1 This is because while it does work, it was impossible (or at the very least too tricky to implement to do so without any error) to ensure that the code would keep working the day it is run on a future ARMv8 processor. The fact the documented status of that API is unclear doesn’t help, either (its man page isn’t in the iOS man pages). You should exclusively use compile-time decision and fat executables if you want to run on ARMv6 and take advantage of ARMv7.
One last note on the subject: in the context of iOS devices, the ARM architecture versions do not strictly mean what they mean for ARM processors in general. For instance, iOS code that requires ARMv6 actually requires support for floating-point instructions as well (VFPv2, to be accurate), which is an optional part of ARMv6, but has been present since the original iPhone, so when ARMv6 is mentioned in iOS development (e.g. as a compiler -arch setting or as the cpu subtype of an executable) hardware floating-point support is implied. The same goes for ARMv7 and NEON: NEON is actually an optional part of the ARMv7-A profile, but NEON has been present in all iOS devices supporting ARMv7, so when developing for iOS NEON is considered part of ARMv7.
- the first devices had an ARM11 processor, which implements ARMv6,
- devices from the iPhone 3GS onwards have a Cortex A8 processor, which implements ARMv7,
- some of these feature the so-called A4 “processor”, in fact a SoC,
- the iPad 2 does not have a Cortex A8, but likely two Cortex A9, which too implement ARMv7
— September 25, 2011
Conditional Execution
A nifty feature of the ARM architecture is that most instructions can be conditionally executed: if a condition is false, the instruction will have no effect. This allows short if blocks to be implemented more efficiently: while the usual method is to jump to after the block if the condition is false, instead the instructions in the block are conditionalized, saving one branch.
Now I wouldn’t mention this if it was just a feature the compiler uses to make the code more efficient; while it is that, I’m mentioning it because it can be surprising when debugging. Indeed, you can sometimes see the debugger go inside if blocks whose conditions you know are false (e.g. early error returns), or go in both sides of an if-else! This is because the processor actually goes through that code, but some parts of it aren’t actually executed, because they are conditionalized. Moreover, if you put a breakpoint inside such an if block, it may be hit even if the condition is false!
That being said, it seems (in my limited testing) that the compiler avoids generating conditionally executed instructions in the debug configuration, so it should only occur when debugging optimized code; unfortunately sometimes you have no choice but to do so.
Thumb
The Thumb instruction set is a subset of the ARM instruction set, compressed so that instructions take only 16 bits (all ARM instructions are 32 bits in size; Thumb is still a 32-bit architecture, just the instructions take less space). It is not a different architecture, rather is should be seen as a shorthand for the most common ARM instructions and functionality. The advantage, of course, is that it allows an important reduction in code size, saving memory, cache, and code bandwidth; while this is more useful in microcontroller type applications where it allows hardware savings in the memory used, it is still useful on iOS devices, and as such it is enabled by default in Xcode iOS projects. The code size reduction is nice, but never actually reaches 50% as sometimes two Thumb instructions are required for one equivalent ARM instruction. ARM and Thumb instructions cannot be freely intermixed, the processor needs to switch mode when going from one to the other; this can only occur when calling or returning from a function. So a function has to either be Thumb or ARM as a whole; in practice you do not control whether code is compiled for Thumb or ARM at the function granularity but rather at the source file granularity.
When targeting ARMv6, compiling for Thumb is a big tradeoff. ARMv6 Thumb code has access to fewer registers, does not have conditional instructions, and in particular cannot use the floating-point hardware. This means for every single floating-point addition, subtraction, multiplication, etc., floating-point Thumb code must call a system function to do it. Yes, this is as slow as it sounds. For this reason, I recommend disabling Thumb mode when targeting ARMv6. If you do leave it on, make sure you profile your code, and if some parts are slow you should first try disabling Thumb at least for that part (easy with file-specific compiler flags in Xcode, use -mno-thumb
). Remember that floating-point calculations are pretty common on iOS since Quartz and Core Animation use a floating-point coordinate system.
When targeting ARMv7, however, all these drawbacks disappear: ARMv7 contains Thumb-2, an extension of the Thumb instruction set which adds support for conditional execution and 32-bit Thumb instructions that allow access to all ARM registers as well as hardware floating-point and NEON. It’s pretty much a free reduction in code size, so it should be left on (or reenabled if you disabled it); use conditional build settings in Xcode to have it enabled for ARMv7 but disabled for ARMv6.
In ARM documentation and discussions on the Internet you may find mentions that code needs to be “interworking” to use Thumb; unless you write your own assembly code, you don’t have to worry about it as all code is interworking in the iOS platform (interworking refers to a set of rules that allow functions compiled for ARM to directly call functions compiled for Thumb, and the converse, without any problem and transparently as far as the C programmer is concerned). When displaying assembly, Shark or the Time Profile instrument may have trouble figuring out whether the function is ARM or Thumb, if you see invalid or nonsensical instructions, you may need to switch from one to the other.
Alignment
In the iOS platform unaligned accesses are supported; however, they are slower than aligned accesses, so try and avoid them. In some particular cases (those involving load/store multiple instructions, if you’re interested), unaligned accesses can be a hundred times slower than aligned accesses, because the processor cannot handle these and has to ask the OS for assistance (read this article, it’s the same phenomenon that on PowerPC causes unaligned doubles to be so much slower). So be careful, alignment still matters.
Division
This one always surprises everyone. Open the ARM architecture manual (if you don’t already have it, see Introduction to NEON on iPhone, section Architecture Overview) and try and find an integer division instruction. Go ahead, I’ll wait. Can’t find it? It’s normal. There is none. Yes, the ARM architecture has no hardware support for integer division, it must be performed in software. If you compile the following code:
int ThousandDividedBy(int divisor)
{
return 1000/divisor;
}
to assembly, you’ll see the compiler has inserted a call to a function called “___divsi3”; it’s a system function that implements the division in software (notice the divisor should be non-constant, as otherwise the division is likely to be turned into a multiplication). This means that on iOS, integer division is actually an operating system benchmark!
“But!” you may say, having finally returned victorious from the ARM manual “You’re wrong! There is an ARM division instruction, even two! Here, sdiv and udiv!” Sorry to rain on your parade, but these instructions are only available in the ARMv7-R and ARMv7-M profiles (real-time and embedded, respectively – think motor microcontrollers and wristwatches), not in ARMv7-A which is the profile that the iOS devices that have ARMv7 do support. Sorry!
GCC
It’s not a secret that the ARM code generated by GCC is considered to be crap. On other ARM-based platforms professional developers use the toolchain provided by ARM itself,RVDS, but this is not an option on the iOS platform as RVDS doesn’t support the Mach-O runtime used by OS X, only the ELF runtime. But there is at least an alternative to GCC, as now LLVM can be used for iOS development. While I’ve not tested it much, I have at least seen nice improvements on 64-bit integer code (a particularly weak point of GCC on ARM) when using LLVM. Hopefully, LLVM will prove to be an improvement over GCC in all domains.
There, you’re now a better iOS developer!