HTML解析看我就够了,不依赖任何第三方,两个方法搞定

看完这篇文章你会知道HTML解析其实很简单~
项目中后台返回的数据是HTML格式的,感觉特别蛋疼,花了不少时间找了不少资料,感觉解析起来都特别麻烦,经过一段时间研究,发现一般HTML格式的数据都是有规律可找的,那么福利来了,下面介绍一种不常见的但是非常简单易懂的方式---> 字符串截取
不废话,上代码~

// 声明文件,
@interface GKTopic : NSObject
/// 帖子ID
@property (nonatomic, copy) NSString *id;
/// 帖子标题
@property (nonatomic, copy) NSString *title;
/// 发帖人
@property (nonatomic, copy) NSString *author;
/// 头像url
@property (nonatomic, copy) NSString *avatarImageUrl;

+ (NSArray *)topics;
@end

实现文件

+ (NSArray *)topics {
    // 加载html
    NSString * html = [NSString stringWithContentsOfFile:[[NSBundle mainBundle] pathForResource:@"v2ex" ofType:@"html"] encoding:NSUTF8StringEncoding error:nil];

    NSMutableArray *topics = [NSMutableArray array];
    // 设置从哪里开始截取
    NSString * matchingBegin = @"cell from_"; // 这个还是需要自己看html源码找规律的~  mathcingEnd 也是一样
    // 设置截取到哪里
    NSString * mathcingEnd = @"</div>";

    NSRange lastRange = NSMakeRange(0, 0);

    // 循环截取
    while ((lastRange = [html rangeOfString:matchingBegin options:0 range:NSMakeRange(lastRange.location, html.length - lastRange.location)]).location != NSNotFound) {
        NSRange endRange = [html rangeOfString:mathcingEnd options:0 range:NSMakeRange(lastRange.location, html.length - lastRange.location)];
        if (endRange.location != NSNotFound) {
            // 获取区间内字符串
            NSString *topicString = [html substringWithRange:NSMakeRange(lastRange.location, endRange.location - lastRange.location)];
            // 标签处理
            GKTopic * topic = [self topicWithString:topicString];
            [topics addObject:topic];
            lastRange = endRange;
        }else {
            break;
        }
    }
    return topics;
}

+ (GKTopic *)topicWithString:(NSString *)string {
    GKTopic *topic = [[GKTopic alloc]init];
    // 查找发帖作者
    topic.author = [string gk_rangeFromeStartString:@"<a href=\"/member/" toEndString:@"\">"];
    // 查找用户头像地址
    topic.avatarImageUrl = [string gk_rangeFromeStartString:@"<img src=\"" toEndString:@"\" class=\"avatar\""];
    // 查找帖子id:如:<a href="/t/291493">,帖子id是291493
    topic.id = [string gk_rangeFromeStartString:@"<a href=\"/t/" toEndString:@"\">"];
    // 查找帖子标题
    NSString *fromStr = [NSString stringWithFormat:@"t/%@\">",topic.id];
    topic.title = [string gk_rangeFromeStartString:fromStr toEndString:@"</a>"];
    return topic;
}

上面用到的NSString分类的方法

- (NSString *)gk_rangeFromeStartString:(NSString *)startString toEndString:(NSString *)endString
{
    NSRange range = [self rangeOfString:startString];
    NSString *string;
    if (range.location != NSNotFound) {
        string = [self substringFromIndex:range.location + range.length];
    }

    range = [string rangeOfString:endString];
    if (range.location != NSNotFound) {
        string = [string substringToIndex:range.location];
    }
    return  string;
}

这里简单截取了部分,其他的各位可以自己尝试下,上面返回数组的方法完全可以抽取出来,比如

/**
 *  @param beginString 起始位置
 *  @param endString 结束位置
 *  @return 模型数组
 */
+ (NSArray *)topicsWithBeginString:(NSString *)beginString endString:(NSString *)endStrng;

方法名字可能有点不规范啊,各位可以自己随便取,这里仅提供思路~

大概就是这样了,如果有不正确的地方欢迎批评指正,
最后放上Demo地址:https://github.com/ChrisCaixx/HtmlToObject
觉得好用的可以点下星星哦,3Q

posted @ 2016-07-22 10:12  花菜ChrisCai  阅读(188)  评论(0编辑  收藏  举报