c++字符编码转换

c++字符编码转换

简述

字符编码一直是软件开发中很麻烦的问题。当前项目开发普遍使用的字符集是utf-8,而windows系统则默认是gbk,linux默认编码则是utf-8,所以想要开发一个在windows正常运行的软件,就需要考虑字符集的问题。

c++11新增了很多本地化的功能,包括字符编码转换等,主要使用wstring_convert和codecvt相结合进行转换,下面介绍具体的方法供大家学习(复制粘贴 😉)。

windows:gbk编码,std::wstring = std::u16string,wchar_t = char16_t (utf-16编码)

linux:utf-8编码,std::wstring = std::u32string,wchar_t = char32_t (utf-32编码)

编码转换

  • 依赖的头文件:

    #include <codecvt>
    #include <locale>
    
  • 转换方法:

    coding.h

    #ifndef TE_TEST_CODING_H
    #define TE_TEST_CODING_H
    
    #include <string>
    
    
    namespace coding {
    
    #ifdef _WIN32
        //GBK locale name in windows
        inline constexpr const char * GBK_LOCALE_NAME = ".936";
    #else
        inline constexpr const char * GBK_LOCALE_NAME = "zh_CN.GBK";
    #endif
    
        /**
         * utf-8 --> wchar
         * @param _utf8 要求std::string的编码是utf-8
         * @return 宽字符串
         */
        std::wstring utf8_to_wstr(const std::string& _utf8);
    
        /**
         * wchar --> utf-8
         * @param _wstr 宽字符串
         * @return 转化为utf-8 编码的字符串
         */
        std::string wstr_to_utf8(const std::wstring& _wstr);
    
        /**
         * utf-8 --> gbk
         * @param _utf8 utf-8
         * @return gbk
         */
        std::string utf8_to_gbk(const std::string& _utf8);
    
        /**
         * gbk --> utf-8
         * @param _gbk gbk
         * @return utf-8
         */
        std::string gbk_to_utf8(const std::string& _gbk);
    
        /**
         * gbk --> std::wstring
         * @param _gbk gbk
         * @return 宽字符串
         */
        std::wstring gbk_to_wstr(const std::string& _gbk);
    
        /**
         * std::wstring --> gbk
         * @param _wstr 宽字符串
         * @return gbk
         */
        std::string wstr_to_gbk(const std::wstring& _wstr);
    }
    
    
    #endif //TE_TEST_CODING_H
    

    coding.cpp

    #include "coding.h"
    
    #include <codecvt>
    #include <locale>
    
    
    // 包装 wstring/wbuffer_convert 所用的绑定本地环境平面的工具
    template<class Facet>
    struct deletable_facet : Facet
    {
        template<class ...Args>
        explicit deletable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {}
        ~deletable_facet() override = default;
    };
    
    
    std::wstring coding::utf8_to_wstr(const std::string &_utf8) {
        std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
        return converter.from_bytes(_utf8);
    }
    
    std::string coding::wstr_to_utf8(const std::wstring &_wstr) {
        std::wstring_convert<std::codecvt_utf8<wchar_t>> convert;
        return convert.to_bytes(_wstr);
    }
    
    std::string coding::utf8_to_gbk(const std::string &_utf8) {
        std::wstring tmp_wstr = utf8_to_wstr(_utf8);
        return wstr_to_gbk(tmp_wstr);
    }
    
    std::string coding::gbk_to_utf8(const std::string &_gbk) {
        std::wstring tmp_wstr = gbk_to_wstr(_gbk);
        return wstr_to_utf8(tmp_wstr);
    }
    
    std::wstring coding::gbk_to_wstr(const std::string &_gbk) {
        using codecvt = deletable_facet<std::codecvt_byname<wchar_t, char, mbstate_t>>;
        std::wstring_convert<codecvt> convert(new codecvt(GBK_LOCALE_NAME));
        return convert.from_bytes(_gbk);
    }
    
    std::string coding::wstr_to_gbk(const std::wstring& _wstr) {
        using codecvt = deletable_facet<std::codecvt_byname<wchar_t, char, mbstate_t>>;
        std::wstring_convert<codecvt> convert(new codecvt(GBK_LOCALE_NAME));
        return convert.to_bytes(_wstr);
    }
    

补充说明

结构体deletable_facet的作用是公有化codecvt_byname模板类的析构函数,该类的析构函数默认为 protected。部分编译环境实现允许析构析构方法为保护的对象,但部分(如GUN)要求自定义类,继承 Facet 并有 public 的析构方法,否则会出现以下问题:

In file included from /usr/include/c++/6.2.1/bits/locale_conv.h:41:0,
                 from /usr/include/c++/6.2.1/locale:43,
                 from main.cpp:3: /usr/include/c++/6.2.1/bits/unique_ptr.h: In instantiation of ‘void std::default_delete<_Tp>::operator()(_Tp*) const [with _Tp = std::codecvt<wchar_t, char, __mbstate_t>]’:
/usr/include/c++/6.2.1/bits/unique_ptr.h:236:17:   required from ‘std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp = std::codecvt<wchar_t, char, __mbstate_t>; _Dp = std::default_delete<std::codecvt<wchar_t, char, __mbstate_t> >]’
/usr/include/c++/6.2.1/bits/locale_conv.h:218:7:   required from here
/usr/include/c++/6.2.1/bits/unique_ptr.h:76:2: error: ‘virtual std::codecvt<wchar_t, char, __mbstate_t>::~codecvt()’ is protected within this context
delete __ptr;
^~~~~~
In file included from /usr/include/c++/6.2.1/codecvt:41:0,
                 from main.cpp:1:
/usr/include/c++/6.2.1/bits/codecvt.h:426:7: note: declared protected here
       ~codecvt();
       ^

详情可见官方文档说明

本文参考了博客并在其基础上进行补充完善,修复了部分问题。

posted @ 2022-03-15 23:25  _哲思  阅读(2075)  评论(3编辑  收藏  举报