day05—XML解析—作业
jsoup:Java HTML解析器,可以理解真实的HTML汤。
jsoup是一个用于处理实际HTML的Java库。它提供了使用DOM,CSS和类似jquery的最佳方法来提取和处理数据的非常方便的API。
jsoup实现WHATWG HTML规范,并将HTML解析为与现代浏览器相同的DOM。
- 从URL,文件或字符串解析HTML
- 使用DOM遍历或CSS选择器查找和提取数据
- 处理HTML元素,属性和文本
- 根据安全的白名单清除用户提交的内容,以防止XSS
- 输出整洁的HTML
jsoup旨在处理野外发现的所有各种HTML;从原始和验证到无效的标签汤;jsoup将创建一个明智的解析树。
选择器语法
选择器是由组合器分隔的简单选择器链。选择器不区分大小写(包括针对元素,属性和属性值)。
当没有元素选择器被提供(即通用选择(*)是隐式的*.header
和.header
等同)。
想要深入了解一下的童靴复制网站进:https://jsoup.org/download 。
开始今天的作业代码演示:
需XML解析代码
<?xml version="1.0" encoding="UTF-8" ?> <students> <student id="1"> <name name="zzl">zzl</name> <age>17</age> <interest> sing and sleep</interest> </student> <student id="2" class="xpx"> <name color="lake blue chatelaine">wjk</name> <age>21</age> <interest> sing and dance and make a film</interest> </student> <div class="loge">One</div> <div>Two</div> </students>
解析过程代码
package Xml05_zy.jsoup; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.select.Elements; import java.io.File; import java.io.IOException; public class Day05_zy2 { public static void main(String[] args) { String path=Day05_zy2.class.getClassLoader().getResource("test.xml").getPath(); try { Document document=Jsoup.parse(new File(path),"utf-8"); //同级索引小于n的元素 Elements elements=document.select("student:lt(1)"); System.out.println(elements);
//间隔 System.out.println("——————————"); //同级索引大于n的元素 Elements elements1=document.select("student:gt(0)"); System.out.println(elements1); System.out.println("——————————"); //兄弟索引等于n的元素 Elements elements2=document.select("student:eq(0)"); System.out.println(elements2); System.out.println("——————————"); // 属性ID为“ id”的元素 Elements elements3=document.select("#1"); System.out.println(elements3); System.out.println("——————————"); // 任何元素 Elements elements4 =document.select("*"); System.out.println("elements4:"+elements4); System.out.println("——————————"); //类名称为“ class”的元素 Elements elements5 =document.select(".xpx"); System.out.println("elements5:"+elements5); System.out.println("——————————"); // 与选择器不匹配的元素。 Elements elements6 = document.select("div").not(".logo"); System.out.println("elements6:"+elements6); System.out.println("——————————"); // 包含至少一个与选择器匹配的元素的元素 Elements elements7 = document.select("students:has(name)"); System.out.println("elements7:"+elements7); System.out.println("——————————"); // 包含指定文本的元素。搜索不区分大小写。文本可能出现在找到的元素或其任何后代中。 Elements elements8 = document.select("students:contains(zzl)"); System.out.println("elements8:"+elements8); System.out.println("——————————"); //属性名为“ attr”且值包含“ valContaining”的元素 Elements elements9 = document.select("[name*=zzl]"); System.out.println("elements9:"+elements9); System.out.println("——————————"); } catch (IOException e) { e.printStackTrace(); } } }
运行结果
"C:\Program Files\Java\jdk1.8.0_45\bin\java.exe" -javaagent:C:\work\ideaIU-2018.2.1\lib\idea_rt.jar=3660:C:\work\ideaIU-2018.2.1\bin -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk1.8.0_45\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_45\jre\lib\rt.jar;G:\S3\day_xml_05_zy\out\production\day_xml_05_zy;G:\S3\day_xml_05_zy\libs\jsoup-1.11.2.jar" Xml05_zy.jsoup.Day05_zy2 <student id="1"> <name name="zzl"> zzl </name> <age> 17 </age> <interest> sing and sleep </interest> </student> —————————— <student id="2" class="xpx"> <name color="lake blue chatelaine"> wjk </name> <age> 21 </age> <interest> sing and dance and make a film </interest> </student> —————————— <student id="1"> <name name="zzl"> zzl </name> <age> 17 </age> <interest> sing and sleep </interest> </student> —————————— <student id="1"> <name name="zzl"> zzl </name> <age> 17 </age> <interest> sing and sleep </interest> </student> —————————— elements4:<!--?xml version="1.0" encoding="UTF-8" ?--> <html> <head></head> <body> <students> <student id="1"> <name name="zzl"> zzl </name> <age> 17 </age> <interest> sing and sleep </interest> </student> <student id="2" class="xpx"> <name color="lake blue chatelaine"> wjk </name> <age> 21 </age> <interest> sing and dance and make a film </interest> </student> <div class="loge"> One </div> <div> Two </div> </students> </body> </html> <html> <head></head> <body> <students> <student id="1"> <name name="zzl"> zzl </name> <age> 17 </age> <interest> sing and sleep </interest> </student> <student id="2" class="xpx"> <name color="lake blue chatelaine"> wjk </name> <age> 21 </age> <interest> sing and dance and make a film </interest> </student> <div class="loge"> One </div> <div> Two </div> </students> </body> </html> <head></head> <body> <students> <student id="1"> <name name="zzl"> zzl </name> <age> 17 </age> <interest> sing and sleep </interest> </student> <student id="2" class="xpx"> <name color="lake blue chatelaine"> wjk </name> <age> 21 </age> <interest> sing and dance and make a film </interest> </student> <div class="loge"> One </div> <div> Two </div> </students> </body> <students> <student id="1"> <name name="zzl"> zzl </name> <age> 17 </age> <interest> sing and sleep </interest> </student> <student id="2" class="xpx"> <name color="lake blue chatelaine"> wjk </name> <age> 21 </age> <interest> sing and dance and make a film </interest> </student> <div class="loge"> One </div> <div> Two </div> </students> <student id="1"> <name name="zzl"> zzl </name> <age> 17 </age> <interest> sing and sleep </interest> </student> <name name="zzl"> zzl </name> <age> 17 </age> <interest> sing and sleep </interest> <student id="2" class="xpx"> <name color="lake blue chatelaine"> wjk </name> <age> 21 </age> <interest> sing and dance and make a film </interest> </student> <name color="lake blue chatelaine"> wjk </name> <age> 21 </age> <interest> sing and dance and make a film </interest> <div class="loge"> One </div> <div> Two </div> —————————— elements5:<student id="2" class="xpx"> <name color="lake blue chatelaine"> wjk </name> <age> 21 </age> <interest> sing and dance and make a film </interest> </student> —————————— elements6:<div class="loge"> One </div> <div> Two </div> —————————— elements7:<students> <student id="1"> <name name="zzl"> zzl </name> <age> 17 </age> <interest> sing and sleep </interest> </student> <student id="2" class="xpx"> <name color="lake blue chatelaine"> wjk </name> <age> 21 </age> <interest> sing and dance and make a film </interest> </student> <div class="loge"> One </div> <div> Two </div> </students> —————————— elements8:<students> <student id="1"> <name name="zzl"> zzl </name> <age> 17 </age> <interest> sing and sleep </interest> </student> <student id="2" class="xpx"> <name color="lake blue chatelaine"> wjk </name> <age> 21 </age> <interest> sing and dance and make a film </interest> </student> <div class="loge"> One </div> <div> Two </div> </students> —————————— elements9:<name name="zzl"> zzl </name> —————————— Process finished with exit code 0
心得:千里之行始于足下 ,脚踏实地走好每一步路,加油↖(^ω^)↗;
结;
写于2020年8月19日;
learn and think ^_^