c++字符编码转换
c++字符编码转换
简述
字符编码一直是软件开发中很麻烦的问题。当前项目开发普遍使用的字符集是utf-8
,而windows系统则默认是gbk
,linux默认编码则是utf-8
,所以想要开发一个在windows正常运行的软件,就需要考虑字符集的问题。
c++11新增了很多本地化的功能,包括字符编码转换等,主要使用wstring_convert和codecvt相结合进行转换,下面介绍具体的方法供大家学习(复制粘贴 😉)。
windows:gbk编码,std::wstring = std::u16string,wchar_t = char16_t (utf-16编码)
linux:utf-8编码,std::wstring = std::u32string,wchar_t = char32_t (utf-32编码)
编码转换
-
依赖的头文件:
#include <codecvt> #include <locale>
-
转换方法:
coding.h
#ifndef TE_TEST_CODING_H #define TE_TEST_CODING_H #include <string> namespace coding { #ifdef _WIN32 //GBK locale name in windows inline constexpr const char * GBK_LOCALE_NAME = ".936"; #else inline constexpr const char * GBK_LOCALE_NAME = "zh_CN.GBK"; #endif /** * utf-8 --> wchar * @param _utf8 要求std::string的编码是utf-8 * @return 宽字符串 */ std::wstring utf8_to_wstr(const std::string& _utf8); /** * wchar --> utf-8 * @param _wstr 宽字符串 * @return 转化为utf-8 编码的字符串 */ std::string wstr_to_utf8(const std::wstring& _wstr); /** * utf-8 --> gbk * @param _utf8 utf-8 * @return gbk */ std::string utf8_to_gbk(const std::string& _utf8); /** * gbk --> utf-8 * @param _gbk gbk * @return utf-8 */ std::string gbk_to_utf8(const std::string& _gbk); /** * gbk --> std::wstring * @param _gbk gbk * @return 宽字符串 */ std::wstring gbk_to_wstr(const std::string& _gbk); /** * std::wstring --> gbk * @param _wstr 宽字符串 * @return gbk */ std::string wstr_to_gbk(const std::wstring& _wstr); } #endif //TE_TEST_CODING_H
coding.cpp
#include "coding.h" #include <codecvt> #include <locale> // 包装 wstring/wbuffer_convert 所用的绑定本地环境平面的工具 template<class Facet> struct deletable_facet : Facet { template<class ...Args> explicit deletable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {} ~deletable_facet() override = default; }; std::wstring coding::utf8_to_wstr(const std::string &_utf8) { std::wstring_convert<std::codecvt_utf8<wchar_t>> converter; return converter.from_bytes(_utf8); } std::string coding::wstr_to_utf8(const std::wstring &_wstr) { std::wstring_convert<std::codecvt_utf8<wchar_t>> convert; return convert.to_bytes(_wstr); } std::string coding::utf8_to_gbk(const std::string &_utf8) { std::wstring tmp_wstr = utf8_to_wstr(_utf8); return wstr_to_gbk(tmp_wstr); } std::string coding::gbk_to_utf8(const std::string &_gbk) { std::wstring tmp_wstr = gbk_to_wstr(_gbk); return wstr_to_utf8(tmp_wstr); } std::wstring coding::gbk_to_wstr(const std::string &_gbk) { using codecvt = deletable_facet<std::codecvt_byname<wchar_t, char, mbstate_t>>; std::wstring_convert<codecvt> convert(new codecvt(GBK_LOCALE_NAME)); return convert.from_bytes(_gbk); } std::string coding::wstr_to_gbk(const std::wstring& _wstr) { using codecvt = deletable_facet<std::codecvt_byname<wchar_t, char, mbstate_t>>; std::wstring_convert<codecvt> convert(new codecvt(GBK_LOCALE_NAME)); return convert.to_bytes(_wstr); }
补充说明
结构体deletable_facet
的作用是公有化codecvt_byname
模板类的析构函数,该类的析构函数默认为 protected。部分编译环境实现允许析构析构方法为保护的对象,但部分(如GUN)要求自定义类,继承 Facet 并有 public 的析构方法,否则会出现以下问题:
In file included from /usr/include/c++/6.2.1/bits/locale_conv.h:41:0,
from /usr/include/c++/6.2.1/locale:43,
from main.cpp:3: /usr/include/c++/6.2.1/bits/unique_ptr.h: In instantiation of ‘void std::default_delete<_Tp>::operator()(_Tp*) const [with _Tp = std::codecvt<wchar_t, char, __mbstate_t>]’:
/usr/include/c++/6.2.1/bits/unique_ptr.h:236:17: required from ‘std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp = std::codecvt<wchar_t, char, __mbstate_t>; _Dp = std::default_delete<std::codecvt<wchar_t, char, __mbstate_t> >]’
/usr/include/c++/6.2.1/bits/locale_conv.h:218:7: required from here
/usr/include/c++/6.2.1/bits/unique_ptr.h:76:2: error: ‘virtual std::codecvt<wchar_t, char, __mbstate_t>::~codecvt()’ is protected within this context
delete __ptr;
^~~~~~
In file included from /usr/include/c++/6.2.1/codecvt:41:0,
from main.cpp:1:
/usr/include/c++/6.2.1/bits/codecvt.h:426:7: note: declared protected here
~codecvt();
^
本文参考了博客并在其基础上进行补充完善,修复了部分问题。
本文来自博客园,作者:_哲思,转载请注明原文链接:https://www.cnblogs.com/zhe-si/p/16011000.html