小端对齐+大端对齐进阶版本V3.0
当涉及到多字节的数据类型(如 uint16_t
、uint32_t
等)时,字节在内存中的存储顺序会影响到数据的解释方式。这个存储顺序可以分为两种:大端对齐(Big Endian)和小端对齐(Little Endian)。
大端对齐(Big Endian):
在大端对齐中,数据的高字节(Most Significant Byte,MSB)存储在内存的低地址,而数据的低字节(Least Significant Byte,LSB)存储在内存的高地址。举个例子,考虑一个 16 位无符号整数 0x1234
:
-
内存中的存储顺序是:
0x12
(高地址)0x34
(低地址)。 -
在二进制流中,高位字节排在前面,低位字节排在后面( 12 34)。
小端对齐(Little Endian):
在小端对齐中,数据的低字节(LSB)存储在内存的低地址,而数据的高字节(MSB)存储在内存的高地址。以同样的例子 0x1234
为例:
-
内存中的存储顺序是:
0x34
(低地址)0x12
(高地址)。 -
在二进制流中,低位字节排在前面,高位字节排在后面( 34 12)。
在现代计算机中,数据通常是按照小端存储的。这意味着在多字节数据类型(如整数、浮点数等)的存储中,最低有效字节(Least Significant Byte,LSB)存储在最低地址,而最高有效字节(Most Significant Byte,MSB)存储在最高地址。
对于 std::memcpy
函数,它只是简单地从源地址开始,按照字节顺序连续地复制数据到目标地址。它不会考虑数据的大小端存储方式。因此,无论源数据是小端存储还是大端存储,std::memcpy
都会按照字节的顺序进行拷贝。这就意味着,当你使用 std::memcpy
从一个变量复制到另一个变量时,字节的存储顺序会被保留。
进一步解释,如果计算机里定义了变量 uint16_t testa=0x1234。它在计算机里面是小端对齐(一般的计算机都是小端对齐)存放的,存放格式是 34 12。
因此,在将小端对齐的二进制流,使用memcpy拷贝的时候,比如拷贝2个字节,赋值给 uint16_t testb 的时候,它是不需要进行 高低位转换的。计算机里面存的顺序,就是 小端对齐 存的顺序。
大端对齐,就不同了。如果计算机里定义了变量 uint16_t testa=0x1234。它在计算机里面是小端对齐(一般的计算机都是小端对齐)存放的,存放格式是 34 12
此时我们有一个 大端对齐的二进制流,它里面如果存放了testa,它在这个二进制流里面存放格式是 12 34 ,但是它在计算出中存放格式应该是 34 12(小端对齐格式)。
因此,大端对齐的时候,要从后往前取数据(先把低地址位置的数据取了, 34 是低地址位的数据)
最后,字符串没有大小端对齐的概念。闭眼 memcpy即可。
这下面的代码是 将数字或者字符串转为 小端对齐,大端对齐;再从 二进制流反向解析
#include <iostream> #include <vector> #include <cstring> #include <iomanip> #include <cstdint> #include <algorithm> bool isLittleEndian() { uint32_t num = 0x01020304; uint8_t* ptr = reinterpret_cast<uint8_t*>(&num); return (*ptr == 0x04); // If the least significant byte (LSB) is 0x04, then it's little-endian } //====================== 处理小端对齐 void appendLittleEndian(std::vector<uint8_t>& block, const void* data, size_t size) { const uint8_t* bytes = static_cast<const uint8_t*>(data); for (size_t i = 0; i < size; ++i) { block.push_back(bytes[i]); } } void appendBigEndian(std::vector<uint8_t>& block, const void* data, size_t size) { const uint8_t* bytes = static_cast<const uint8_t*>(data); for (size_t i = size; i > 0; --i) { block.push_back(bytes[i - 1]); } } //======================= 处理大端对齐 template<typename T> void parseLittleEndian(const std::vector<uint8_t>& block, size_t& offset, T& value) { std::memcpy(&value, block.data() + offset, sizeof(T)); offset += sizeof(T); } template<typename T> void parseBigEndian(const std::vector<uint8_t>& block, size_t& offset, T& value) { value = 0; for (size_t i = 0; i < sizeof(T); ++i) { value <<= 8; value |= block[offset + i]; } offset += sizeof(T); } //=================== 处理字符串 void appendPaddedString(std::vector<uint8_t>& block, const char* str, size_t length, size_t paddedLength) { size_t len = std::strlen(str); if (len > length) { len = length; } block.insert(block.end(), str, str + len);//这样的效率高吗? for (size_t i = len; i < paddedLength; ++i) { block.push_back(0); // Padding with null bytes } } void parsePaddedString(const std::vector<uint8_t>& block, size_t& offset, char* str, size_t length) { std::memcpy(str, &block[offset], length); offset += length; // Offset increment str[length] = '\0'; // Ensure null-terminated string } //===============处理字符串2, 更加高效率一些 void appendPaddedString2(std::vector<uint8_t>& block, const char* str, size_t length, size_t paddedLength) { size_t len = std::strlen(str); if (len > length) { len = length; } std::copy(str, str + length, std::back_inserter(block)); //! 2个函数的差异点 for (size_t i = len; i < paddedLength; ++i) { block.push_back(0); // Padding with null bytes } } int main() { if(isLittleEndian()){ std::cout<<" system is little endian"<<std::endl; } else{ std::cout<<" system is big endian"<<std::endl; } std::vector<uint8_t> littleEndianBlock; std::vector<uint8_t> bigEndianBlock; std::vector<uint8_t> strBlock1; std::vector<uint8_t> strBlock2; uint16_t num16 = 0x1234; // 4660 uint32_t num32 = 0x56000078; //1442840696 // Append uint16_t (2 bytes) in little-endian format appendLittleEndian(littleEndianBlock, &num16, sizeof(num16)); // Append uint32_t (4 bytes) in little-endian format appendLittleEndian(littleEndianBlock, &num32, sizeof(num32)); appendBigEndian(bigEndianBlock, &num16, sizeof(num16)); appendBigEndian(bigEndianBlock, &num32, sizeof(num32)); // Output binary stream std::cout << "Binary Stream in little-endian format:\n"; for (uint8_t byte : littleEndianBlock) { std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(byte) << " "; } std::cout << std::endl; std::cout << "Binary Stream in big-endian format:\n"; for (uint8_t byte : bigEndianBlock) { std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(byte) << " "; } std::cout << std::endl; char szMsg[30]={0}; strncpy(szMsg, "Hello World", sizeof(szMsg)); auto paddedLength = (0 == sizeof(szMsg)%4)? (sizeof(szMsg)):((sizeof(szMsg)/4+1)*4); appendPaddedString(strBlock1, szMsg, strlen(szMsg), paddedLength); std::cout << "Binary Stream in str1 format. block1 size:"<<std::dec<<strBlock1.size()<<std::endl; for (uint8_t byte : strBlock1) { std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(byte) << " "; } std::cout << std::endl; appendPaddedString2(strBlock2, szMsg, strlen(szMsg), paddedLength); std::cout << "Binary Stream in str2 format. block2 size:"<<std::dec<<strBlock2.size()<<std::endl; for (uint8_t byte : strBlock2) { std::cout << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(byte) << " "; } std::cout << std::endl; //============================================ 解析二进制流 ============================================ size_t offset = 0; uint16_t little_a=0; uint32_t little_b=0; parseLittleEndian(littleEndianBlock,offset, little_a); parseLittleEndian(littleEndianBlock,offset, little_b); std::cout<<std::dec<<"offset:"<<offset<<",block.size:"<<littleEndianBlock.size()<<",little_a:"<<static_cast<int>(little_a)<<",little_b:"<<static_cast<int>(little_b)<<std::endl; //parseBigEndian size_t offset2 = 0; uint16_t big_a=0; uint32_t big_b=0; parseBigEndian(bigEndianBlock, offset2, big_a); parseBigEndian(bigEndianBlock, offset2, big_b); std::cout<<std::dec<<"offset2:"<<offset2<<",block.size:"<<bigEndianBlock.size()<<",big_a:"<<static_cast<int>(big_a)<<",big_b:"<<static_cast<int>(big_b)<<std::endl; //parseStr char szBuff1[100] = {0}; char szBuff2[100] = {0}; size_t offset3 = 0; size_t offset4 = 0; parsePaddedString(strBlock1,offset3,szBuff1,sizeof(szBuff1)); parsePaddedString(strBlock2,offset4,szBuff2,sizeof(szBuff2)); std::cout <<std::dec<<"strBlock1.size:"<<strBlock1.size()<<",offset3:"<<offset3<<",strlen(szBuff1):"<<strlen(szBuff1)<< ",szBuff1:"<<szBuff1<<std::endl; std::cout <<std::dec<<"strBlock2.size:"<<strBlock2.size()<<",offset4:"<<offset4<<",strlen(szBuff2):"<<strlen(szBuff2)<< ",szBuff2:"<<szBuff2<<std::endl; return 0; } /************************************************ uint16_t num16 = 0x1234; // 4660 uint32_t num32 = 0x56000078; //1442840696 屏幕输入的内容: Binary Stream in little-endian format: 34 12 78 00 00 56 Binary Stream in big-endian format: 12 34 56 00 00 78 Binary Stream in str1 format. block1 size:32 48 65 6c 6c 6f 20 57 6f 72 6c 64 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Binary Stream in str2 format. block2 size:32 48 65 6c 6c 6f 20 57 6f 72 6c 64 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 offset:6,block.size:6,little_a:4660,little_b:1442840696 offset2:6,block.size:6,big_a:4660,big_b:1442840696 strBlock1.size:32,offset3:100,strlen(szBuff1):11,szBuff1:Hello World strBlock2.size:32,offset4:100,strlen(szBuff2):11,szBuff2:Hello World **************************************************/