java基础之“在后端使用爬虫Jsoup工具根据标签id获取字符串中的标签html代码(java后端实现前端根据标签id获取标签对象)”
一.场景
在电商项目中产品描述时必不可少的存在,每个不同的项目所需的描述不同,不能一概而论
在产品的描述中的部分数据是我们所需要的,如价格,尺码表等
如何在不依靠前端的前提下,完成数据的提取就成了问题
二.思路
首先看产品描述的存储方式:我这边是直接整个以字符串存储在表字段中,
尽然是字符串,那我们就能使用Jsoup工具类来获取Document对象(也可以用其他的方案)
再用getElementById("标签id")方法获取标签对象
因为我这里是直接要标签对象(包括html标签)
所以我直接toString()既可,如果是要内部的内容,不要html标签,就用test()方法
三.需要获取的结果
三.代码
/**
* 功能描述: 实现在java中根据字符串中的标签id获取对应的标签对象
*
* @author 王子威
*/
@Test public void extractChart() { // 产品描述:假数据 String desc = "<p align=\"center\"></p>\n" + "<p align=\"center\">啊啊啊啊</p>\n" + "<p align=\"center\"></p>\n" + "<p " + "align=\"center\">\n" + "</p>\n" + "<div id=\"sizechart-template1\">\n" + "<table " + "border=\"1\" style=\"width:800px;margin:10px auto;\">\n" + "<thead> <tr>\n" + "<td " + "style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;" + "\">Size</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;" + "text-align:center;\">Label Size</td>\n" + "<td style=\"font-size:11pt;font-weight:700;" + "font-family:Arial;padding:5px;text-align:center;\">Bust</td>\n" + "<td style=\"font-size" + ":11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Waist</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;" + "\">Length</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;" + "padding:5px;text-align:center;\">Height</td>\n" + "</tr>\n" + "</thead> <tbody>\n" + "<tr>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">100</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">56cm/22.0</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">23cm/9.1</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">11cm/4.3</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">36cm/14.2</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">36cm/14.2</td>\n" + "</tr>\n" + "<tr><td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">100</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">56cm/22.0</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">23cm/9.1</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">11cm/4.3</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">36cm/14.2</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">36cm/14.2</td>\n" + "</tr>\n" + "</tbody>\n" + "</table>\n" + "</div>\n" + "<br />\n" + "<br />\n" + "<div id=\"sizechart-template2\"><table border=\"1\" style=\"width:800px;margin:10px auto;\">\n" + "<tbody>\n" + "<tr>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Size:100</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Label Size:56cm/22.0</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Bust:23cm/9.1</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Waist:11cm/4.3</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Length:36cm/14.2</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Height:36cm/14.2</td>\n" + "</tr>\n" + "<tr>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Size:100</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Label Size:56cm/22.0</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Bust:23cm/9.1</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Waist:11cm/4.3</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Length:36cm/14.2</td>\n" + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Height:36cm/14.2</td>\n" + "</tr>\n" + "</tbody>\n" + "</table>\n" + "</div>\n" + "<p></p>\n" + "<p align=\"center\"></p>\n" + "<p align=\"center\" style=\"text-align:left;\"></p>\n" + "<p align=\"center\"></p>\n" + "<p align=\"center\"><img src=\"https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fup.enterdesk.com%2Fedpic%2F09%2F3a%2Fbc%2F093abce7b31f4c8ffdbf345375ff4abb.jpg&refer=http%3A%2F%2Fup.enterdesk.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=auto?sec=1652421336&t=d2da9a6657364617cdcdbf0aa8e0002e\" /></p>\n" + "<p align=\"center\"></p>\n" + "<p align=\"center\"><p align=\"center\"><img src=\"https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fimg.jj20.com%2Fup%2Fallimg%2F1111%2F04261Q53521%2F1P426153521-1-1200.jpg&refer=http%3A%2F%2Fimg.jj20.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=auto?sec=1652421677&t=05a703168ad75e76ce2bddf3b32382dd\" /></p>\n" + "</p>"; // 获取Document对象 Document doc = Jsoup.parse(desc); // 根据<div>标签中的id获取标签对象 Element elementById1 = doc.getElementById("sizechart-template1"); Element elementById2 = doc.getElementById("sizechart-template2"); // 标签转String String a = elementById1.toString(); System.out.println("a = " + a); String b = elementById2.toString(); System.out.println("b = " + b); // 获取内容 String text = elementById1.text(); System.out.println("text = " + text); }
结果
标签对象1
标签对象2
标签对象内容
* 博客文章部分截图及内容来自于学习的书本及相应培训课程,仅做学习讨论之用,不做商业用途。
* 如有侵权,马上联系我,我立马删除对应链接。
* 备注:王子威
* 我的网易邮箱:wzw_1314_520@163.com