【jsoup】html解析

Java HTML Parser

字符串解析为xml文档,作用输入是什么样子的片断,输出业务什么样子的

Document doc = Jsoup.parse(html, "", Parser.xmlParser());
System.out.println(doc.html());

片断<div>hello</div>

Document doc = Jsoup.parse(html, "", Parser.xmlParser());结果
<div>
 hello
</div>

Document doc = Jsoup.parse(html);结果
<html>
 <head></head>
 <body>
  <div>
   hello
  </div>
 </body>
</html>

 

字符串解析为文档

String html = "<html><head><title>First html parse</title></head><body><p>Parsed HTML into a doc.</p></body></html>";
Document doc = Jsoup.parse(html);
System.out.println(doc.html());

 

字符串解析为片断

String html = "<div><p>Lorem ipsum.</p>";
Document doc = Jsoup.parseBodyFragment(html);
Element body = doc.body();
System.out.println(body.html());

 

从url加载文档

Document doc = Jsoup.connect("http://www.lianhu.gov.cn/").get();
String title = doc.title();
System.out.println(title);
构建特殊请求
Document doc = Jsoup.connect("http://www.lianhu.gov.cn/")
        .data("query", "Java")
        .userAgent("Mozilla")
        .cookie("auth", "token")
        .timeout(3000)
        .post();

 

从文件加载文档

File input = new File("D:/deya/vhost/zizhou/index.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
System.out.println(doc.html());

 

posted @ 2022-03-04 14:26  翠微  阅读(137)  评论(0编辑  收藏  举报