c++结构体数据对齐

What is Data Alignment?

In programming language, a data object (variable) has 2 properties; its value and the storage location (address). Data alignment means that the address of a data can be evenly divisible by 1, 2, 4, or 8. In other words, data object can have 1-byte, 2-byte, 4-byte, 8-byte alignment or any power of 2. For instance, if the address of a data is 12FEECh (1244908 in decimal), then it is 4-byte alignment because the address can be evenly divisible by 4. (You can divide it by 2 or 1, but 4 is the highest number that is divisible evenly.)

CPU does not read from or write to memory one byte at a time. Instead, CPU accesses memory in 2, 4, 8, 16, or 32 byte chunks at a time. The reason for doing this is the performance - accessing an address on 4-byte or 16-byte boundary is a lot faster than accessing an address on 1-byte boundary.

The following diagram illustrates how CPU accesses a 4-byte chuck of data with 4-byte memory access granularity.

Memory mapping from memory to CPU cache

If the data is misaligned of 4-byte boundary, CPU has to perform extra work to access the data: load 2 chucks of data, shift out unwanted bytes then combine them together. This process definitely slows down the performance and wastes CPU cycle just to get right data from memory.

Misaligned data slows down data access performance

Structure Member Alignment

In 32-bit x86 systems, the alignment is mostly same as its size of data type. Compiler aligns variables on their natural length boundaries. CPU will handle misaligned data properly, so you do not need to align the address explicitly.

Data alignment for each type
Data Type	Alignment (bytes)
char	1
short	2
int	4
float	4
double	4 or 8

However, the story is a little different for member data in struct, union or class objects. The struct (or union, class) member variables must be aligned to the highest bytes of the size of any member variables to prevent performance penalties. For example, if you have 1 char variable (1-byte) and 1 int variable (4-byte) in a struct, the compiler will pads 3 bytes between these two variables. Therefore, the total size of this struct variable is 8 bytes, instead of 5 bytes. By doing this, the address of this struct data is divisible evenly by 4. This is called structure member alignment. Of course, the size of struct will be grown as a consequence.

 // size = 2 bytes, alignment = 1-byte, address can be divisible by 1 struct S1 {     char m1;    // 1-byte     char m2;    // 1-byte };  // size = 4 bytes, alignment = 2-byte, address can be divisible by 2 struct S2 {     char m1;    // 1-byte                 // padding 1-byte space here     short m2;   // 2-byte };  // size = 8 bytes, alignment = 4-byte, address can be divisible by 4 struct S3 {     char m1;    // 1-byte                 // padding 3-byte space here     int m2;     // 4-byte };  // size = 16 bytes, alignment = 8-byte, address can be divisible by 8 struct S4 {     char m1;    // 1-byte                 // padding 7-byte space here     double m2;  // 8-byte };  // size = 16 bytes, alignment = 8-byte, address can be divisible by 8 struct S5 {     char m1;    // 1-byte                 // padding 3-byte space here     int m2;     // 4-byte     double m2;  // 8-byte };

You may use "pack" pragma directive to specify different packing alignment for struct, union or class members.

 // 1-byte struct member alignment // size = 9, alignment = 1-byte, no padding for these struct members #pragma pack(push, 1) struct S6 {     char m1;    // 1-byte     double m2;  // 8-byte }; #pragma pack(pop)

Be aware of using custom struct member alignment. It may cause serious compatibility issues, for example, linking external library using different packing alignments. It is better use default alignment all the time.

Data Alignment for SSE

SSE (Streaming SIMD Extensions) defines 128-bit (16-byte) packed data types (4 of 32-bit float data) and access to data can be improved if the address of data is aligned by 16-byte; divisible evenly by 16.

You can declare a variable with 16-byte aligned in MSVC, using __declspec(align(16)) keyword;

 __declspec(align(16)) float array[SIZE]; ...  struct __declspec(align(16)) S1 {     float v[4]; }

Dynamic array can be allocated using _aligned_malloc() function, and deallocated using _aligned_free().

 // allocate 16-byte aligned data float* array = (float*)_aligned_malloc(SIZE*sizeof(float), 16); ...  // deallocate memory _aligned_free(array);

Or, you can manually align address like this;

 // allocate 15 byte larger array // because in worst case, the data can be misaligned upto 15 bytes. float* array = (float*)malloc(SIZE*sizeof(float)+15);  // find the aligned position // and use this pointer to read or write data into array float* alignedArray = (float*)(((unsigned long)array + 15) & (~0x0F)); ...  // dellocate memory original "array", NOT alignedArray free(array); array = alignedArray = 0;

Because 16-byte aligned address must be divisible by 16, the least significant digit in hex number should be 0 all the time. That is why logical operators are used to make the first digit zero in hex number.
Bitwise AND Operator

And, you may have from 0 to 15 bytes misaligned address. In worst case, you have to move the address 15 bytes forward before bitwise AND operation. Therefore, you need to append 15 bytes extra when allocating memory. For example, the 16-byte aligned addresses from 1000h are 1000h, 1010h, 1020h, 1030h, and so on. And if malloc() or C++ new operator allocates a memory space at 1011h, then we need to move 15 bytes forward, which is the next 16-byte aligned address.
Aligned and Misaligned

src：http://www.songho.ca/misc/alignment/dataalign.html

http://www.cppblog.com/deercoder/archive/2011/03/13/141747.html

原文链接: https://www.cnblogs.com/starsky/archive/2013/03/21/2972492.html

欢迎关注

微信关注下方公众号，第一时间获取干货硬货；公众号内回复【pdf】免费获取数百本计算机经典书籍

原创文章受到原创版权保护。转载请注明出处：https://www.ccppcoding.com/archives/81461

非原创文章文中已经注明原地址，如有侵权，联系删除

关注公众号【高性能架构探索】，第一时间获取最新文章

转载文章受原作者版权保护。转载请注明原作者出处！

c++结构体数据对齐

What is Data Alignment?

Structure Member Alignment

Data Alignment for SSE

相关推荐