day05—XML解析—作业

jsoup:Java HTML解析器,可以理解真实的HTML汤。

jsoup是一个用于处理实际HTML的Java库。它提供了使用DOM,CSS和类似jquery的最佳方法来提取和处理数据的非常方便的API。

jsoup实现WHATWG HTML规范,并将HTML解析为与现代浏览器相同的DOM。

  • 从URL,文件或字符串解析HTML
  • 使用DOM遍历或CSS选择器查找和提取数据
  • 处理HTML元素,属性和文本
  • 根据安全的白名单清除用户提交的内容,以防止XSS
  • 输出整洁的HTML

jsoup旨在处理野外发现的所有各种HTML;从原始和验证到无效的标签汤;jsoup将创建一个明智的解析树。

选择器语法

选择器是由组合器分隔的简单选择器链。选择器不区分大小写(包括针对元素,属性和属性值)。

当没有元素选择器被提供(即通用选择(*)是隐式的*.header.header 等同)。

想要深入了解一下的童靴复制网站进:https://jsoup.org/download  

开始今天的作业代码演示:

需XML解析代码

<?xml version="1.0" encoding="UTF-8" ?>
<students>
    <student id="1">
        <name name="zzl">zzl</name>
        <age>17</age>
        <interest> sing and sleep</interest>
    </student>
    <student id="2" class="xpx">
        <name color="lake blue chatelaine">wjk</name>
        <age>21</age>
        <interest> sing and dance and make a film</interest>
    </student>
    <div class="loge">One</div>
    <div>Two</div>
</students>

 解析过程代码

package Xml05_zy.jsoup;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

import java.io.File;
import java.io.IOException;

public class Day05_zy2 {
    public static void main(String[] args) {
        String path=Day05_zy2.class.getClassLoader().getResource("test.xml").getPath();
        try {
            Document document=Jsoup.parse(new File(path),"utf-8");
            //同级索引小于n的元素
            Elements elements=document.select("student:lt(1)");
            System.out.println(elements);
       //间隔 System.out.println(
"——————————"); //同级索引大于n的元素 Elements elements1=document.select("student:gt(0)"); System.out.println(elements1); System.out.println("——————————"); //兄弟索引等于n的元素 Elements elements2=document.select("student:eq(0)"); System.out.println(elements2); System.out.println("——————————"); // 属性ID为“ id”的元素 Elements elements3=document.select("#1"); System.out.println(elements3); System.out.println("——————————"); // 任何元素 Elements elements4 =document.select("*"); System.out.println("elements4:"+elements4); System.out.println("——————————"); //类名称为“ class”的元素 Elements elements5 =document.select(".xpx"); System.out.println("elements5:"+elements5); System.out.println("——————————"); // 与选择器不匹配的元素。 Elements elements6 = document.select("div").not(".logo"); System.out.println("elements6:"+elements6); System.out.println("——————————"); // 包含至少一个与选择器匹配的元素的元素 Elements elements7 = document.select("students:has(name)"); System.out.println("elements7:"+elements7); System.out.println("——————————"); // 包含指定文本的元素。搜索不区分大小写。文本可能出现在找到的元素或其任何后代中。 Elements elements8 = document.select("students:contains(zzl)"); System.out.println("elements8:"+elements8); System.out.println("——————————"); //属性名为“ attr”且值包含“ valContaining”的元素 Elements elements9 = document.select("[name*=zzl]"); System.out.println("elements9:"+elements9); System.out.println("——————————"); } catch (IOException e) { e.printStackTrace(); } } }

 

 运行结果

"C:\Program Files\Java\jdk1.8.0_45\bin\java.exe" -javaagent:C:\work\ideaIU-2018.2.1\lib\idea_rt.jar=3660:C:\work\ideaIU-2018.2.1\bin -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk1.8.0_45\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\rt.jar;G:\S3\day_xml_05_zy\out\production\day_xml_05_zy;G:\S3\day_xml_05_zy\libs\jsoup-1.11.2.jar" Xml05_zy.jsoup.Day05_zy2
<student id="1"> 
 <name name="zzl">
  zzl
 </name> 
 <age>
  17
 </age> 
 <interest>
   sing and sleep
 </interest> 
</student>
——————————
<student id="2" class="xpx"> 
 <name color="lake blue chatelaine">
  wjk
 </name> 
 <age>
  21
 </age> 
 <interest>
   sing and dance and make a film
 </interest> 
</student>
——————————
<student id="1"> 
 <name name="zzl">
  zzl
 </name> 
 <age>
  17
 </age> 
 <interest>
   sing and sleep
 </interest> 
</student>
——————————
<student id="1"> 
 <name name="zzl">
  zzl
 </name> 
 <age>
  17
 </age> 
 <interest>
   sing and sleep
 </interest> 
</student>
——————————
elements4:<!--?xml version="1.0" encoding="UTF-8" ?-->
<html>
 <head></head>
 <body>
  <students> 
   <student id="1"> 
    <name name="zzl">
     zzl
    </name> 
    <age>
     17
    </age> 
    <interest>
      sing and sleep
    </interest> 
   </student> 
   <student id="2" class="xpx"> 
    <name color="lake blue chatelaine">
     wjk
    </name> 
    <age>
     21
    </age> 
    <interest>
      sing and dance and make a film
    </interest> 
   </student> 
   <div class="loge">
    One
   </div> 
   <div>
    Two
   </div> 
  </students>
 </body>
</html>
<html>
 <head></head>
 <body>
  <students> 
   <student id="1"> 
    <name name="zzl">
     zzl
    </name> 
    <age>
     17
    </age> 
    <interest>
      sing and sleep
    </interest> 
   </student> 
   <student id="2" class="xpx"> 
    <name color="lake blue chatelaine">
     wjk
    </name> 
    <age>
     21
    </age> 
    <interest>
      sing and dance and make a film
    </interest> 
   </student> 
   <div class="loge">
    One
   </div> 
   <div>
    Two
   </div> 
  </students>
 </body>
</html>
<head></head>
<body>
 <students> 
  <student id="1"> 
   <name name="zzl">
    zzl
   </name> 
   <age>
    17
   </age> 
   <interest>
     sing and sleep
   </interest> 
  </student> 
  <student id="2" class="xpx"> 
   <name color="lake blue chatelaine">
    wjk
   </name> 
   <age>
    21
   </age> 
   <interest>
     sing and dance and make a film
   </interest> 
  </student> 
  <div class="loge">
   One
  </div> 
  <div>
   Two
  </div> 
 </students>
</body>
<students> 
 <student id="1"> 
  <name name="zzl">
   zzl
  </name> 
  <age>
   17
  </age> 
  <interest>
    sing and sleep
  </interest> 
 </student> 
 <student id="2" class="xpx"> 
  <name color="lake blue chatelaine">
   wjk
  </name> 
  <age>
   21
  </age> 
  <interest>
    sing and dance and make a film
  </interest> 
 </student> 
 <div class="loge">
  One
 </div> 
 <div>
  Two
 </div> 
</students>
<student id="1"> 
 <name name="zzl">
  zzl
 </name> 
 <age>
  17
 </age> 
 <interest>
   sing and sleep
 </interest> 
</student>
<name name="zzl">
 zzl
</name>
<age>
 17
</age>
<interest>
  sing and sleep
</interest>
<student id="2" class="xpx"> 
 <name color="lake blue chatelaine">
  wjk
 </name> 
 <age>
  21
 </age> 
 <interest>
   sing and dance and make a film
 </interest> 
</student>
<name color="lake blue chatelaine">
 wjk
</name>
<age>
 21
</age>
<interest>
  sing and dance and make a film
</interest>
<div class="loge">
 One
</div>
<div>
 Two
</div>
——————————
elements5:<student id="2" class="xpx"> 
 <name color="lake blue chatelaine">
  wjk
 </name> 
 <age>
  21
 </age> 
 <interest>
   sing and dance and make a film
 </interest> 
</student>
——————————
elements6:<div class="loge">
 One
</div>
<div>
 Two
</div>
——————————
elements7:<students> 
 <student id="1"> 
  <name name="zzl">
   zzl
  </name> 
  <age>
   17
  </age> 
  <interest>
    sing and sleep
  </interest> 
 </student> 
 <student id="2" class="xpx"> 
  <name color="lake blue chatelaine">
   wjk
  </name> 
  <age>
   21
  </age> 
  <interest>
    sing and dance and make a film
  </interest> 
 </student> 
 <div class="loge">
  One
 </div> 
 <div>
  Two
 </div> 
</students>
——————————
elements8:<students> 
 <student id="1"> 
  <name name="zzl">
   zzl
  </name> 
  <age>
   17
  </age> 
  <interest>
    sing and sleep
  </interest> 
 </student> 
 <student id="2" class="xpx"> 
  <name color="lake blue chatelaine">
   wjk
  </name> 
  <age>
   21
  </age> 
  <interest>
    sing and dance and make a film
  </interest> 
 </student> 
 <div class="loge">
  One
 </div> 
 <div>
  Two
 </div> 
</students>
——————————
elements9:<name name="zzl">
 zzl
</name>
——————————

Process finished with exit code 0

 

心得:千里之行始于足下 ,脚踏实地走好每一步路,加油↖(^ω^)↗;

                                              结;

写于2020年8月19日;

  learn and think ^_^

 

 

posted @ 2020-08-19 18:49  二零二零  阅读(82)  评论(0编辑  收藏  举报