NSStringEncoding

今天看见一个很棒的博客,只是无法粉丝之,就转载一下几篇很好用的博文吧

转载至:http://hi.baidu.com/may2150209/blog/item/198976ace7e583054b36d6f1.html

PS:发现博主也是转载的,anyway,好用就行

以下为正文

今天在尝试抓取起点中文网首页的时候遇到了一个问题 — 如果编码没有用对的话是没办

法读取任何东西的.

 

这也算是C#用的太多养成的坏习惯, 以前基本没怎么考虑过编码问题. 应该说, C#里面就算编码错了, 也能读进来东西,

只是一片乱码而已. Cocoa里面就狠了点, 直接抛异常了.

下面是刚开始写的一段代码, 把起点中文网的主页下载到一个字符串中.

NSURL *url = [[NSURL alloc]

initWithString:@"http://www.cmfu.com"];

NSError *error;

NSString *xml = [NSString stringWithContentsOfURL:url encoding:NSUTF8StringEncoding error:&error];

if(xml == nil)

{ NSLog(@"Error reading url at %@", [error localizedFailureReason]); }

else { [result setString:xml]; }

死活下载失败, 错误信息就是编码不对. 好吧, 我打开了帮助查看了下所有的编码:

enum {

NSASCIIStringEncoding =

1,

NSNEXTSTEPStringEncoding =

2,

NSJapaneseEUCStringEncoding =

3,

NSUTF8StringEncoding =

4,

NSISOLatin1StringEncoding =

5,

NSSymbolStringEncoding =

6,

NSNonLossyASCIIStringEncoding =

7,

NSShiftJISStringEncoding =

8,

NSISOLatin2StringEncoding =

9,

NSUnicodeStringEncoding =

10,

NSWindowsCP1251StringEncoding =

11,

NSWindowsCP1252StringEncoding =

12,

NSWindowsCP1253StringEncoding =

13,

NSWindowsCP1254StringEncoding =

14,

NSWindowsCP1250StringEncoding =

15,

NSISO2022JPStringEncoding =

21,

NSMacOSRomanStringEncoding =

30,

NSUTF16StringEncoding = NSUnicodeStringEncoding,

NSUTF16BigEndianStringEncoding =

0x90000100,

NSUTF16LittleEndianStringEncoding =

0x94000100,

NSUTF32StringEncoding =

0x8c000100,

NSUTF32BigEndianStringEncoding =

0x98000100,

NSUTF32LittleEndianStringEncoding =

0x9c000100,

};

我一个一个的试,

居然全都不行! 崩溃了, 这都什么年代了, 难道Cocoa还不支持中文? 不可能啊.

估计是上面那份文档里面只是列出了最长用的几种编码(这里是苹果认为最长用的, 可见对于中国基本是无视了, 鄙视下!),

我就写了下面这段代码输出了所有支持的编码:

const NSStringEncoding *encodings = [NSString availableStringEncodings];

NSMutableString *str = [[NSMutableString alloc] init];

NSStringEncoding encoding;

while ((encoding = *encodings++) != 0)

{

[str appendFormat: @"%@ === %in", [NSString localizedNameOfStringEncoding:encoding], encoding]; }

[result setString: str];

好家伙, 果然被我猜中了, 下面就是所有支持的编码列表

Western (Mac OS Roman) === 30

Japanese (Mac OS) === -2147483647

Traditional Chinese (Mac OS) === -2147483646

Korean (Mac OS) === -2147483645

Arabic (Mac OS) === -2147483644

Hebrew (Mac OS) === -2147483643

Greek (Mac OS) === -2147483642

Cyrillic (Mac OS) === -2147483641

Devanagari (Mac OS) === -2147483639

Gurmukhi (Mac OS) === -2147483638

Gujarati (Mac OS) === -2147483637

Thai (Mac OS) === -2147483627

Simplified Chinese (Mac OS) === -2147483623

Tibetan (Mac OS) === -2147483622

Central European (Mac OS) === -2147483619

Symbol (Mac OS) === 6

Dingbats (Mac OS) === -2147483614

Turkish (Mac OS) === -2147483613

Croatian (Mac OS) === -2147483612

Icelandic (Mac OS) === -2147483611

Romanian (Mac OS) === -2147483610

Celtic (Mac OS) === -2147483609

Gaelic (Mac OS) === -2147483608

Keyboard Symbols (Mac OS) === -2147483607

Farsi (Mac OS) === -2147483508

Cyrillic (Mac OS Ukrainian) === -2147483496

Inuit (Mac OS) === -2147483412

Unicode (UTF-32LE) === -1677721344

Unicode (UTF-8) === 4

Unicode (UTF-16) === 10

Unicode (UTF-16BE) === -1879047936

Unicode (UTF-16LE) === -1811939072

Unicode (UTF-32) === -1946156800

Unicode (UTF-32BE) === -1744830208

Western (ISO Latin 1) === 5

Central European (ISO Latin 2) === 9

Western (ISO Latin 3) === -2147483133

Central European (ISO Latin 4) === -2147483132

Cyrillic (ISO 8859-5) === -2147483131

Arabic (ISO 8859-6) === -2147483130

Greek (ISO 8859-7) === -2147483129

Hebrew (ISO 8859-8) === -2147483128

Turkish (ISO Latin 5) === -2147483127

Nordic (ISO Latin 6) === -2147483126

Thai (ISO 8859-11) === -2147483125

Baltic Rim (ISO Latin 7) === -2147483123

Celtic (ISO Latin ===

-2147483122

Western (ISO Latin 9) === -2147483121

Romanian (ISO Latin 10) === -2147483120

Latin-US (DOS) === -2147482624

Greek (DOS) === -2147482619

Baltic Rim (DOS) === -2147482618

Western (DOS Latin 1) === -2147482608

Greek (DOS Greek 1) === -2147482607

Central European (DOS Latin 2) === -2147482606

Cyrillic (DOS) === -2147482605

Turkish (DOS) === -2147482604

Portuguese (DOS) === -2147482603

Icelandic (DOS) === -2147482602

Hebrew (DOS) === -2147482601

Canadian French (DOS) === -2147482600

Arabic (DOS) === -2147482599

Nordic (DOS) === -2147482598

Cyrillic (DOS) === -2147482597

Greek (DOS Greek 2) === -2147482596

Thai (Windows, DOS) === -2147482595

Japanese (Windows, DOS) === 8

Simplified Chinese (Windows, DOS) === -2147482591

Korean (Windows, DOS) === -2147482590

Traditional Chinese (Windows, DOS) === -2147482589

Western (Windows Latin 1) === 12

Central European (Windows Latin 2) === 15

Cyrillic (Windows) === 11

Greek (Windows) === 13

Turkish (Windows Latin 5) === 14

Hebrew (Windows) === -2147482363

Arabic (Windows) === -2147482362

Baltic Rim (Windows) === -2147482361

Vietnamese (Windows) === -2147482360

Western (ASCII) === 1

Japanese (Shift JIS X0213) === -2147482072

Chinese (GBK) === -2147482063

Chinese (GB 18030) === -2147482062

Japanese (ISO 2022-JP) === 21

Korean (ISO 2022-KR) === -2147481536

Japanese (EUC) === 3

Simplified Chinese (EUC) === -2147481296

Traditional Chinese (EUC) === -2147481295

Korean (EUC) === -2147481280

Japanese (Shift JIS) === -2147481087

Cyrillic (KOI8-R) === -2147481086

Traditional Chinese (Big 5) === -2147481085

Western (Mac Mail) === -2147481084

Simplified Chinese (HZ GB 2312) === -2147481083

Traditional Chinese (Big 5 HKSCS) === -2147481082

Ukrainian (KOI8-U) === -2147481080

Traditional Chinese (Big 5-E) === -2147481079

Western (NextStep) === 2

Non-lossy ASCII === 7

Western (EBCDIC Latin 1) === -2147480574

终于看到了熟悉的 GBK 编码, 对应的代码是 -2147482063. Ok, 更改一下最开始的代码

NSURL *url = [[NSURL alloc] initWithString:@"http://www.cmfu.com"];

NSError *error;

NSStringEncoding encoder;

NSString *xml = [NSString stringWithContentsOfURL:url encoding:encoder=-2147482063 error:&error];

if(xml == nil)

{ NSLog(@"Error reading url at %@", [error localizedFailureReason]); }

else { [result setString:xml]; }

终于搞定了! 看到熟悉的中文真是激动了.

注:转载的

posted @ 2012-12-31 16:21  郑文亮  阅读(8045)  评论(0编辑  收藏  举报