java基础之“在后端使用爬虫Jsoup工具根据标签id获取字符串中的标签html代码(java后端实现前端根据标签id获取标签对象)”

一.场景

在电商项目中产品描述时必不可少的存在,每个不同的项目所需的描述不同,不能一概而论

在产品的描述中的部分数据是我们所需要的,如价格,尺码表等

如何在不依靠前端的前提下,完成数据的提取就成了问题

二.思路

首先看产品描述的存储方式:我这边是直接整个以字符串存储在表字段中,

尽然是字符串,那我们就能使用Jsoup工具类来获取Document对象(也可以用其他的方案)

再用getElementById("标签id")方法获取标签对象

因为我这里是直接要标签对象(包括html标签)

所以我直接toString()既可,如果是要内部的内容,不要html标签,就用test()方法

三.需要获取的结果

三.代码

/**
* 功能描述: 实现在java中根据字符串中的标签id获取对应的标签对象
*
* @author 王子威
*/
@Test
public void extractChart()
{
    // 产品描述:假数据
    String desc = "<p align=\"center\"></p>\n" + "<p align=\"center\">啊啊啊啊</p>\n" + "<p align=\"center\"></p>\n" + "<p " + "align=\"center\">\n" + "</p>\n"
            + "<div id=\"sizechart-template1\">\n" + "<table " + "border=\"1\" style=\"width:800px;margin:10px auto;\">\n" + "<thead> <tr>\n" + "<td "
            + "style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;" + "\">Size</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;" + "text-align:center;\">Label Size</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;" + "font-family:Arial;padding:5px;text-align:center;\">Bust</td>\n" + "<td style=\"font-size"
            + ":11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Waist</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;" + "\">Length</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;" + "padding:5px;text-align:center;\">Height</td>\n" + "</tr>\n" + "</thead> <tbody>\n" + "<tr>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">100</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">56cm/22.0</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">23cm/9.1</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">11cm/4.3</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">36cm/14.2</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">36cm/14.2</td>\n" + "</tr>\n"
            + "<tr><td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">100</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">56cm/22.0</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">23cm/9.1</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">11cm/4.3</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">36cm/14.2</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">36cm/14.2</td>\n" + "</tr>\n" + "</tbody>\n" + "</table>\n"
            + "</div>\n" + "<br />\n" + "<br />\n" + "<div id=\"sizechart-template2\"><table border=\"1\" style=\"width:800px;margin:10px auto;\">\n" + "<tbody>\n" + "<tr>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Size:100</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Label Size:56cm/22.0</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Bust:23cm/9.1</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Waist:11cm/4.3</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Length:36cm/14.2</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Height:36cm/14.2</td>\n" + "</tr>\n" + "<tr>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Size:100</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Label Size:56cm/22.0</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Bust:23cm/9.1</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Waist:11cm/4.3</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Length:36cm/14.2</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Height:36cm/14.2</td>\n" + "</tr>\n" + "</tbody>\n" + "</table>\n"
            + "</div>\n" + "<p></p>\n" + "<p align=\"center\"></p>\n" + "<p align=\"center\" style=\"text-align:left;\"></p>\n" + "<p align=\"center\"></p>\n"
            + "<p align=\"center\"><img src=\"https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fup.enterdesk.com%2Fedpic%2F09%2F3a%2Fbc%2F093abce7b31f4c8ffdbf345375ff4abb.jpg&refer=http%3A%2F%2Fup.enterdesk.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=auto?sec=1652421336&t=d2da9a6657364617cdcdbf0aa8e0002e\" /></p>\n"
            + "<p align=\"center\"></p>\n"
            + "<p align=\"center\"><p align=\"center\"><img src=\"https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fimg.jj20.com%2Fup%2Fallimg%2F1111%2F04261Q53521%2F1P426153521-1-1200.jpg&refer=http%3A%2F%2Fimg.jj20.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=auto?sec=1652421677&t=05a703168ad75e76ce2bddf3b32382dd\" /></p>\n" + "</p>";
    // 获取Document对象
    Document doc = Jsoup.parse(desc);
    
    // 根据<div>标签中的id获取标签对象
    Element elementById1 = doc.getElementById("sizechart-template1");
    Element elementById2 = doc.getElementById("sizechart-template2");
    // 标签转String
    String a = elementById1.toString();
    System.out.println("a = " + a);
    String b = elementById2.toString();
    System.out.println("b = " + b);
    // 获取内容
    String text = elementById1.text();
    System.out.println("text = " + text);
}

 

结果

标签对象1

标签对象2

标签对象内容

 

posted @ 2022-04-13 14:14  骚哥  阅读(919)  评论(0编辑  收藏  举报