什么是字节对齐,为什么要对齐?
Computer Systems: A Programmer's Perspective:
.align 4
This ensures that the data following it (in this case the start of the jump table) will start with an address that is a multiple of 4. Since each table entry is 4 bytes long, the successive elements will obey the 4-byte alignment restriction.
Library routines that allocate memory, such as malloc, must be designed so that they return a pointer that satisfies the worst-case alignment restriction
for the machine it is running on, typically 4 or 8. For code involving structures, the compiler may need to insert gaps in the field allocation to ensure that each structure element satisfies its alignment requirement. The structure then has some required alignment for its starting address.
For example, consider the following structure declaration:
Suppose the compiler used the minimal 9-byte allocation, diagrammed as follows:
In addition, the compiler may need to add padding to the end of the structure so that each element in an array of structures will satisfy its alignment requirement.
For example, consider the following structure declaration:
If we pack this structure into 9 bytes, we can still satisfy the alignment requirements for fields i and j by making sure that the starting address of the structure satisfies
a 4-byte alignment requirement. Consider, however, the following declaration:
struct S2 d[4];
With the 9-byte allocation, it is not possible to satisfy the alignment requirement for each element of d, because these elements will have addresses xd, xd + 9,
xd + 18, and xd + 27. Instead, the compiler allocates 12 bytes for structure S2, with the final 3 bytes being wasted space:
3.9.3 Data Alignment
Many computer systems place restrictions on the allowable addresses for the primitive data types, requiring that the address for some type of object must be a multiple of some value K (typically 2, 4, or 8). Such alignment restrictions simplify the design of the hardware forming the interface between the processor and the memory system. For example, suppose a processor always fetches 8 bytes from memory with an address that must be a multiple of 8. If we can guarantee that any double will be aligned to have its address be a multiple of 8, then the value can be read or written with a single memory operation. Otherwise, we may need to perform two memory accesses, since the object might be split across two 8-byte memory blocks.
The IA32 hardware will work correctly regardless of the alignment of data. However, Intel recommends that data be aligned to improve memory system performance. Linux follows an alignment policy where 2-byte data types (e.g., short) must have an address that is a multiple of 2, while any larger data types (e.g., int, int *, float, and double) must have an address that is a multiple of 4. Note that this requirement means that the least significant bit of the address of an object of type short must equal zero. Similarly, any object of type int, or any pointer, must be at an address having the low-order 2 bits equal to zero.
Aside: A case of mandatory alignment
For most IA32 instructions, keeping data aligned improves efficiency, but it does not affect program behavior. On the other hand, some of the SSE instructions for implementing multimedia operations will not work correctly with unaligned data. These instructions operate on 16-byte blocks of data, and the instructions that transfer data between the SSE unit and memory require the memory addresses to be multiples of 16. Any attempt to access memory with an address that does not satisfy this alignment will lead to an exception, with the default behavior for the program to terminate.
This is the motivation behind the IA32 convention of making sure that every stack frame is a multiple of 16 bytes long (see the aside of page 226). The compiler can allocate storage within a stack frame in such a way that a block can be stored with a 16-byte alignment.
Aside: Alignment with Microsoft Windows
Microsoft Windows imposes a stronger alignment requirement—any primitive object of K bytes, for K = 2, 4, or 8, must have an address that is a multiple of K. In particular, it requires that the address of a double or a long long be a multiple of 8. This requirement enhances the memory performance at the expense of some wasted space. The Linux convention, where 8-byte values are aligned on 4-byte boundaries was probably good for the i386, back when memory was scarce and memory interfaces were only 4 bytes wide. With modern processors, Microsoft’s alignment is a better design decision. Data type long double, for which gcc generates IA32 code allocating 12 bytes (even though the actual data type requires only 10 bytes) has a 4-byte alignment requirement with both Windows and Linux.
Many computer systems place restrictions on the allowable addresses for the primitive data types, requiring that the address for some type of object must be a multiple of some value K (typically 2, 4, or 8). Such alignment restrictions simplify the design of the hardware forming the interface between the processor and the memory system. For example, suppose a processor always fetches 8 bytes from memory with an address that must be a multiple of 8. If we can guarantee that any double will be aligned to have its address be a multiple of 8, then the value can be read or written with a single memory operation. Otherwise, we may need to perform two memory accesses, since the object might be split across two 8-byte memory blocks.
The IA32 hardware will work correctly regardless of the alignment of data. However, Intel recommends that data be aligned to improve memory system performance. Linux follows an alignment policy where 2-byte data types (e.g., short) must have an address that is a multiple of 2, while any larger data types (e.g., int, int *, float, and double) must have an address that is a multiple of 4. Note that this requirement means that the least significant bit of the address of an object of type short must equal zero. Similarly, any object of type int, or any pointer, must be at an address having the low-order 2 bits equal to zero.
Aside: A case of mandatory alignment
For most IA32 instructions, keeping data aligned improves efficiency, but it does not affect program behavior. On the other hand, some of the SSE instructions for implementing multimedia operations will not work correctly with unaligned data. These instructions operate on 16-byte blocks of data, and the instructions that transfer data between the SSE unit and memory require the memory addresses to be multiples of 16. Any attempt to access memory with an address that does not satisfy this alignment will lead to an exception, with the default behavior for the program to terminate.
This is the motivation behind the IA32 convention of making sure that every stack frame is a multiple of 16 bytes long (see the aside of page 226). The compiler can allocate storage within a stack frame in such a way that a block can be stored with a 16-byte alignment.
Aside: Alignment with Microsoft Windows
Microsoft Windows imposes a stronger alignment requirement—any primitive object of K bytes, for K = 2, 4, or 8, must have an address that is a multiple of K. In particular, it requires that the address of a double or a long long be a multiple of 8. This requirement enhances the memory performance at the expense of some wasted space. The Linux convention, where 8-byte values are aligned on 4-byte boundaries was probably good for the i386, back when memory was scarce and memory interfaces were only 4 bytes wide. With modern processors, Microsoft’s alignment is a better design decision. Data type long double, for which gcc generates IA32 code allocating 12 bytes (even though the actual data type requires only 10 bytes) has a 4-byte alignment requirement with both Windows and Linux.
Alignment is enforced by making sure that every data type is organized and allocated in such a way that every object within the type satisfies its alignment restrictions. The compiler places directives in the assembly code indicating the desired alignment for global data. For example, the assembly-code declaration of the jump table beginning on page 217 contains the following directive on line 2:
.align 4
This ensures that the data following it (in this case the start of the jump table) will start with an address that is a multiple of 4. Since each table entry is 4 bytes long, the successive elements will obey the 4-byte alignment restriction.
Library routines that allocate memory, such as malloc, must be designed so that they return a pointer that satisfies the worst-case alignment restriction
for the machine it is running on, typically 4 or 8. For code involving structures, the compiler may need to insert gaps in the field allocation to ensure that each structure element satisfies its alignment requirement. The structure then has some required alignment for its starting address.
For example, consider the following structure declaration:
struct S1 { int i; char c; int j; };
Suppose the compiler used the minimal 9-byte allocation, diagrammed as follows:
Then it would be impossible to satisfy the 4-byte alignment requirement for both fields i (offset 0) and j (offset 5). Instead, the compiler inserts a 3-byte gap (shown
here as shaded in blue) between fields c and j:
here as shaded in blue) between fields c and j:
As a result, j has offset 8, and the overall structure size is 12 bytes. Furthermore, the compiler must ensure that any pointer p of type struct S1* satisfies
a 4-byte alignment. Using our earlier notation, let pointer p have value xp. Then xp must be a multiple of 4. This guarantees that both p->i (address xp)
a 4-byte alignment. Using our earlier notation, let pointer p have value xp. Then xp must be a multiple of 4. This guarantees that both p->i (address xp)
and p->j (address xp + 8) will satisfy their 4-byte alignment requirements.
In addition, the compiler may need to add padding to the end of the structure so that each element in an array of structures will satisfy its alignment requirement.
For example, consider the following structure declaration:
struct S2 { int i; int j; char c; };
a 4-byte alignment requirement. Consider, however, the following declaration:
struct S2 d[4];
With the 9-byte allocation, it is not possible to satisfy the alignment requirement for each element of d, because these elements will have addresses xd, xd + 9,
xd + 18, and xd + 27. Instead, the compiler allocates 12 bytes for structure S2, with the final 3 bytes being wasted space:
That way the elements of d will have addresses xd, xd + 12, xd + 24, and xd + 36. As long as xd is a multiple of 4, all of the alignment restrictions will be satisfied.
实例测试
1 #include "stdio.h" 2 3 /* 地址参考基准 */ 4 char r1; 5 short r2;int refer; 6 7 struct p 8 { 9 int a; 10 char b; 11 short c; 12 }__attribute__((aligned(4))) pp; 13 /* 4字节对齐,a本身占4字节 此处占4字节,b本身占1字节 此处占2字节(补齐4字节),c本身占2字节 此处占2字节(补齐4字节)*/ 14 15 struct m 16 { 17 char a; 18 int b; 19 short c; 20 }__attribute__((aligned(4))) mm; 21 /* 4字节对齐,a本身占1字节 此处占4字节(补齐4字节),b本身占4字节 此处占4字节,c本身占2字节 此处占4字节(补齐4字节) */ 22 23 struct o 24 { 25 int a; 26 char b; 27 short c; 28 char d; 29 }oo; 30 /* 默认字节对齐同4字节对齐,a本身占4字节 此处占4字节,b本身占1字节 此处占2字节,c本身占2字节 此处占2字节、 31 (b和c一起补齐占4字节,因为b是从4的整数倍地址开始放的),d本身占1字节 此处占4字节(默认字节对齐) */ 32 33 struct x 34 { 35 int a; 36 char b; 37 struct p px; 38 short c; 39 }__attribute__((aligned(8))) xx; 40 /* 8字节对齐,a本身占4字节 此处占4字节,b本身占1字节 此处占4字节(同a一起补齐8字节因为px是8字节的要从8的整数倍地址放), 41 px本身占8字节 此处占8字节,c本身占2字节 此处占8字节(由于8字节对齐且c是从8的整数倍地址开始放的,所以c要补齐到8字节) */ 42 43 int main() 44 { 45 /* 数据类型所占内存大小 */ 46 printf("sizeof(int)=%d, sizeof(short)=%d, sizeof(char)=%d \n", sizeof(int), sizeof(short), sizeof(char)); 47 /* 单个变量的地址分配 */ 48 printf("sizeof(refer)=%d, &refer=0x%08X, &r1=0x%08X,&r2=0x%08X \n", sizeof(refer), &refer, &r1, &r2); 49 /* 4字节对齐的结构体地址分配 */ 50 printf("pp=%d, &pp=0x%08X, &pp.a=0x%08X, &pp.b=0x%08X, &pp.c=0x%08X \n", sizeof(pp), &pp, &pp.a, &pp.b, &pp.c); 51 /* 4字节对齐的结构体地址分配 */ 52 printf("mm=%d, &mm=0x%08X, &mm.a=0x%08X, &mm.b=0x%08X, &mm.c=0x%08X \n", sizeof(mm), &mm, &mm.a, &mm.b, &mm.c); 53 /* 默认字节对齐的结构体地址分配 */ 54 printf("oo=%d, &oo=0x%08X, &oo.a=0x%08X, &oo.b=0x%08X, &oo.c=0x%08X, &oo.d=0x%08X \n", sizeof(oo), &oo, &oo.a, &oo.b, &oo.c, &oo.d); 55 /* 8字节对齐的结构体地址分配 */ 56 printf("xx=%d, &xx=0x%08X, &xx.a=0x%08X, &xx.b=0x%08X, &xx.px=0x%08X, &xx.c=0x%08X \n", sizeof(xx), &xx, &xx.a, &xx.b, &xx.px, &xx.c); 57 58 return 0; 59 }
输出结果
再牛逼的梦想也架不住傻逼似的坚持