Boost汉字匹配 -- 宽字符

　 原文链接：http://blog.csdn.net/sptoor/article/details/4930069

　　思路：汉字匹配，把字符都转换成宽字符，然后再匹配。　

需要用到以下和宽字符有关的类：

　　1、wstring：
　　作为STL中和string相对应的类，专门用于处理宽字符串。方法和string都一样，区别是value_type是wchar_t。wstring类的对象要赋值或连接的常量字符串必须以L开头标示为宽字符。
　　2、wregex：
　　和regex相对应，专门处理宽字符的正则表达式类。同样可以使用regex_match()和regex_replace()等函数。regex_match()的结果需要放在wsmatch类的对象中。

字符和宽字符的相互转换：

　　1、RTL的方法

　　//把字符串转换成宽字符串
     setlocale( LC_CTYPE, "" ); // 很重要，没有这一句，转换会失败。
     int iWLen= mbstowcs( NULL, sToMatch.c_str(), sToMatch.length() ); // 计算转换后宽字符串的长度。（不包含字符串结束符）
     wchar_t *lpwsz= new wchar_t[iWLen+1];
     int i= mbstowcs( lpwsz, sToMatch.c_str(), sToMatch.length() ); // 转换。（转换后的字符串有结束符）
     wstring wsToMatch(lpwsz);
     delete []lpwsz;

   //把宽字符串转换成字符串，输出使用
    int iLen= wcstombs( NULL, wsm[1].str().c_str(), 0 ); // 计算转换后字符串的长度。（不包含字符串结束符）
    char *lpsz= new char[iLen+1];
    int i= wcstombs( lpsz, wsm[1].str().c_str(), iLen ); // 转换。（没有结束符）
    lpsz[iLen] = '\0';
    string sToMatch(lpsz);
    delete []lpsz;

　　2、Win32 SDK的方法

　　//把字符串转换成宽字符串
　　int iWLen= MultiByteToWideChar( CP_ACP, 0, sToMatch.c_str(), sToMatch.size(), 0, 0 ); // 计算转换后宽字符串的长度。（不包含字符串结束符）
　　wchar_t *lpwsz= new wchar_t [iWLen+1];
　　MultiByteToWideChar( CP_ACP, 0, sToMatch.c_str(), sToMatch.size(), lpwsz, iWLen ); // 正式转换。
　　wsz[iWLen] = L'\0';
　　//把宽字符串转换成字符串，输出使用
　　int iLen= WideCharToMultiByte( CP_ACP, NULL, wsResult.c_str(), -1, NULL, 0, NULL, FALSE ); // 计算转换后字符串的长度。（包含字符串结束符）
　　char *lpsz= new char[iLen];
　　WideCharToMultiByte( CP_OEMCP, NULL, wsResult.c_str(), -1, lpsz, iLen, NULL, FALSE); // 正式转换。
　　Result.assign( lpsz, iLen-1 ); // 对string对象进行赋值。

　　示例：

　　通过以下程序我们可以看到，对字符串做\w匹配时，某些字会引起匹配失败。通过把字符串转换成宽字符串尝试解决这个问题。

#include <iostream>
using std::cout;
using std::endl;
#include <string>
using std::string;
using std::wstring;
#include <locale>

#include "boost\tr1\regex.hpp"
using namespace boost;

void MatchWords(string sToMatch)
{
     regex rg("(\\w*)");
     smatch sm;
     regex_match( sToMatch, sm, rg );
     cout << "匹配结果：" << sm[1].str() << endl;
}

void MatchWords(wstring wsToMatch)
{
     wregex wrg(L"(\\w*)");
     wsmatch wsm;
     regex_match( wsToMatch, wsm, wrg );

    int iLen= wcstombs( NULL, wsm[1].str().c_str(), 0 );
    char *lpsz= new char[iLen+1];
    int i= wcstombs( lpsz, wsm[1].str().c_str(), iLen );
    lpsz[iLen] = '\0';

     string sToMatch(lpsz);
     delete []lpsz;
     cout << "匹配结果：" << sToMatch << endl;
}

void main()
{
     string sToMatch("数超限");
     MatchWords( sToMatch );
     sToMatch = "节点数目超限";
     MatchWords( sToMatch );

     setlocale( LC_CTYPE, "" );
     int iWLen= mbstowcs( NULL, sToMatch.c_str(), sToMatch.length() );
     wchar_t *lpwsz= new wchar_t[iWLen+1];
     int i= mbstowcs( lpwsz, sToMatch.c_str(), sToMatch.length() );

     wstring wsToMatch(lpwsz);
     delete []lpwsz;
     MatchWords( wsToMatch );
}

　　编译执行程序后输出：

    匹配结果：数超限
    匹配结果：
    匹配结果：节点数目超限

　　第一行显示“数超限”匹配成功。但第二行“节点数超限”没有匹配到任何字符。只有转换成宽字符串之后才能够对“节点数超限”成功进行\w匹配。

声明：本文来自CSDN博客，转载请标明出处：http://blog.csdn.net/sptoor/article/details/4930069

posted @ 2013-12-03 22:26 lmei 阅读(934) 评论(0) 收藏举报

刷新页面返回顶部

Boost汉字匹配 -- 宽字符

公告