在使用itextpdf对富文本转pdf时遇到Invalid nested tag XX found, expected closing tag XX的错误

发生错误的原因是手动生成的html的标签没有闭合或者语法不规范导致的，可以使用jsoup工具对html文件进行标准化处理，实现如下：

html 可以是富文本或者是 html 文件

private static String formatHtml(String html) {

String contents = html.replaceAll("src=\"/cds_filestorage/download-s", "src=\"https://orangecds.com/cds_filestorage/download-s");
String contentss = contents.replaceAll("data-mce-src=\"/cds_filestorage/download-s", "data-mce-src=\"https://orangecds.com/cds_filestorage/download-s");
String contentRe = contentss.replaceAll("<video.*?>.+?</video>", "");
log.info("content2Html-转换后的html:" + contentss);
org.jsoup.nodes.Document doc = Jsoup.parse(contentRe);
// jsoup生成闭合标签
doc.outputSettings().syntax(org.jsoup.nodes.Document.OutputSettings.Syntax.xml);
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
System.out.println("----"+doc.html());
org.jsoup.nodes.Document doc = Jsoup.parse(contentRe);

// 去除过大的宽度
String style = doc.attr("style");
if ((!style.isEmpty()) && style.contains("width")) {
doc.attr("style", "");
}
Elements divs = doc.select("div");
for (Element div : divs) {
String divStyle = div.attr("style");
if ((!divStyle.isEmpty()) && divStyle.contains("width")) {
div.attr("style", "");
}
}
// jsoup生成闭合标签
doc.outputSettings().syntax(org.jsoup.nodes.Document.OutputSettings.Syntax.xml);
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
return doc.html();
}

输入String类型的html文本对象，返回标准的html格式的String对象。
需要用到的jsoup包见我上传的文件
原文链接：https://blog.csdn.net/lxh1205509119/article/details/110402366

posted @ 2021-10-23 16:19 星空物语之韵阅读(975) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

星空物语之韵

我想遇到一起奋斗的河南小伙，和你们一起漫步在代码的海洋里远行

在使用itextpdf对富文本转pdf时遇到Invalid nested tag XX found, expected closing tag XX的错误

公告