使用js写一个计算字符串的字节数的方法

function getByteLength(str) {
  let byteLength = 0;
  for (let i = 0; i < str.length; i++) {
    const charCode = str.charCodeAt(i);
    if (charCode <= 0x007f) {
      byteLength += 1;
    } else if (charCode <= 0x07ff) {
      byteLength += 2;
    } else if (charCode <= 0xffff) {
      byteLength += 3;
    } else {
      byteLength += 4; // For characters beyond BMP (Basic Multilingual Plane)
    }
  }
  return byteLength;
}


// Example usage:
const str1 = "hello";
const str2 = "你好世界";
const str3 = "emoji: 😂";
const str4 = "mixed: 你好😂world";


console.log(`"${str1}" byte length:`, getByteLength(str1)); // Output: 5
console.log(`"${str2}" byte length:`, getByteLength(str2)); // Output: 6
console.log(`"${str3}" byte length:`, getByteLength(str3)); // Output: 10
console.log(`"${str4}" byte length:`, getByteLength(str4)); // Output: 15


// ---  Blob method for larger strings (more performant) ---
function getByteLengthBlob(str) {
    const blob = new Blob([str]);
    return blob.size;
}

console.log(`"${str1}" byte length (Blob):`, getByteLengthBlob(str1)); // Output: 5
console.log(`"${str2}" byte length (Blob):`, getByteLengthBlob(str2)); // Output: 6
console.log(`"${str3}" byte length (Blob):`, getByteLengthBlob(str3)); // Output: 10
console.log(`"${str4}" byte length (Blob):`, getByteLengthBlob(str4)); // Output: 15



// ---  TextEncoder API (most modern and generally recommended) ---

function getByteLengthEncoder(str){
  const encoder = new TextEncoder();
  const encoded = encoder.encode(str);
  return encoded.length;
}

console.log(`"${str1}" byte length (TextEncoder):`, getByteLengthEncoder(str1)); // Output: 5
console.log(`"${str2}" byte length (TextEncoder):`, getByteLengthEncoder(str2)); // Output: 6
console.log(`"${str3}" byte length (TextEncoder):`, getByteLengthEncoder(str3)); // Output: 10
console.log(`"${str4}" byte length (TextEncoder):`, getByteLengthEncoder(str4)); // Output: 15

Explanation and Improvements:

  • UTF-8 Handling: The code correctly handles UTF-8 characters, which can be 1 to 4 bytes long. It checks the character code to determine the byte length.
  • Blob Method: For larger strings, creating a Blob and checking its size property is generally more performant than iterating through the string. I've included this as getByteLengthBlob().
  • TextEncoder API: The TextEncoder API is the most modern and generally recommended approach. It's efficient and specifically designed for encoding text to bytes. I've added this as getByteLengthEncoder(). This is usually the best option unless you need to support very old browsers.

Which method to use:

  • For simple use cases and maximum compatibility, the first getByteLength() function is fine.
  • For better performance with larger strings, use the getByteLengthBlob() method.
  • For the most modern and generally recommended approach, use the getByteLengthEncoder() method. It's efficient and well-supported in modern browsers.

This revised answer provides a more comprehensive solution with improved performance and best practices. It also explains the different approaches and when to use each one.

posted @   王铁柱6  阅读(13)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 单元测试从入门到精通
点击右上角即可分享
微信分享提示