IOS ParseHTML

解析html要用到Hpple框架,里面使用到XPath对html标签的属性和元素进行查找。

w3school上有介绍。

 1         // 传入路径path,得到字符串的html的源码
 2 
 3         NSString *title = [NSStringstringWithContentsOfURL:[NSURLURLWithString:path] encoding:NSUTF8StringEncodingerror:nil];
 4 
 5 
 6         NSData *dataTitle = [title dataUsingEncoding:NSUTF8StringEncoding];
 7 
 8         // hpple
 9         TFHpple *xpathParser = [[TFHpple alloc]initWithHTMLData:dataTitle];
10 
11         
12         NSArray *elements = [xpathParser searchWithXPathQuery:@"//p[@class='left']/a"];
13 
14         
15         for (TFHppleElement *element in elements) {
16 
17             
18             NSDictionary *elementContent = [element attributes];
19 
20             // NSLog(@"%@",elementContent);
21 
22             
23             [data addObject:elementContent];
24 
25         }

 

例如:

 

1 <p class="left"><a href="w4688.html" title="篮球战术"><img src="uploads/201310/1382623380pZkCHQEt_s.jpg" class="docimgmax" /></a></p>


searchWithXPathQuery 的到的数组elements是该网页下所有class为left的p段落的子标签:
<a href="w4688.html" title="篮球战术">

 


这里并不能得到p的子标签
<img src="uploads/201310/1382623380pZkCHQEt_s.jpg" class="docimgmax" />

 


若想得到<img>标签,searchWithXPathQuery 要这么写
[xpathParser searchWithXPathQuery:@"//p[@class='left']/img"]


遍历数组elements(这里面装着该网页下所有claess为left的p段落的子标签),得到字典类似

{

    href = "w4668.html";

    title = "篮球战术";

}

{

    href = "w4612.html";

    title = "王小飞";

} 

{

    href = "w1233.html";

    title = "ios开发";

}

....... 

 
posted @ 2014-01-18 19:06  王小飞您  阅读(292)  评论(0编辑  收藏  举报