Jericho Html Parser 案例使用

1.官方网址

http://jericho.htmlparser.net/docs/index.html

2.jar包下载地址

https://sourceforge.net/projects/jerichohtml/files/?source=navbar

3.进入官网会出现如下入门案列

比如点击DisplayAllElements 会跳转到下面的代码页面

import net.htmlparser.jericho.*;
import java.util.*;
import java.io.*;
import java.net.*;

public class DisplayAllElements {
    public static void main(String[] args) throws Exception {
        String sourceUrlString="data/test.html";
        if (args.length==0)
          System.err.println("Using default argument of \""+sourceUrlString+'"');
        else
            sourceUrlString=args[0];
        if (sourceUrlString.indexOf(':')==-1) sourceUrlString="file:"+sourceUrlString;
        MicrosoftConditionalCommentTagTypes.register();
        PHPTagTypes.register();
        PHPTagTypes.PHP_SHORT.deregister(); // remove PHP short tags for this example otherwise they override processing instructions
        MasonTagTypes.register();
        Source source=new Source(new URL(sourceUrlString));
        List<Element> elementList=source.getAllElements();
        for (Element element : elementList) {
            System.out.println("-------------------------------------------------------------------------------");
            System.out.println(element.getDebugInfo());
            if (element.getAttributes()!=null) System.out.println("XHTML StartTag:\n"+element.getStartTag().tidy(true));
            System.out.println("Source text with content:\n"+element);
        }
        System.out.println(source.getCacheDebugInfo());
  }
}

 

 

posted @ 2017-01-15 03:53  瓦肯船长  阅读(297)  评论(0编辑  收藏  举报