PDF操作

一、概述

生成PDF文档通常涉及使用模板引擎、PDF库以及数据填充。常见以下几种方法：

iText

iText是一个 Java 库，主要用于处理 PDF 文档，提供了一下的主要功能：

创建：以代码的形式创建PDF，添加文本、图形、表格、图像等元素。
修改：修改PDF，添加、删除或修改文本、图形等。
提取：从PDF提取文本内容、图像等。
合并：将多个PDF合并成一个单一的文档。
加密：PDF加密和安全功能。
数字签名：支持数字签名，可以用于验证文档的完整性和来源。
表单处理：处理PDF表单，包括填充表单字段、提取表单数据等。
页面操作：控制PDF的页面属性，如大小、方向和页边距。

场景：您可以使用iText来直接构建PDF文档，也可以将其与模板引擎结合使用，通过数据填充来生成PDF。

Apache PDFBox

PDFBox是Apache软件基金会的一个项目，提供创建和处理PDF文档的功能。

box功能和itext差不多，但存在一下主要区别：

iText 在某些情况下可能需要商业许可证，而 Apache PDFBox 是基于 Apache License 2.0 开源许可证的。
iText 在某些高级功能方面可能更为强大，例如处理复杂的排版、布局等。
PDFBox 有时被认为在处理复杂的排版任务上不如 iText 强大，但它足够满足一般需求。

场景： PDFBox可用于构建PDF文档，您可以将其用于模板生成PDF。

Freemarker和Velocity

Freemarker和Velocity是两个常见的模板引擎，可以用于生成文本模板。
场景：您可以使用这些引擎生成包含占位符的文本模板，然后使用PDF库将其转换为PDF。

Flying Saucer（XHTMLRenderer）

Flying Saucer是一个基于Java的渲染引擎，可将XHTML和CSS转换为PDF。核心是一个基于开源浏览器引擎的渲染引擎，通常基于 Mozilla Rhino 或 Mozilla Firefox。

Flying Saucer 能够模拟浏览器行为，将网页内容渲染为图像或 PDF。

Flying Saucer可以选择不同的后端实现，其中包括 flying-saucer-pdf-openpdf（使用的是OpenPDF，以前称为 iText 5作为底层的 PDF 操作库）。

这个实现的目的是将 Flying Saucer 与 OpenPDF 集成，使得 Flying Saucer 能够以 PDF 格式输出。

场景：您可以使用Flying Saucer来渲染HTML模板，并将其转换为PDF。

Thymeleaf

Thymeleaf是一个用于Web和独立环境的现代服务器端Java模板引擎。
场景： Thymeleaf可以用于生成HTML模板，然后通过转换库（如Flying Saucer）将HTML转换为PDF。

JasperReports

JasperReports是一个用于生成报表的开源报表引擎。
场景： JasperReports支持定义报表模板，将数据与模板结合生成PDF报表。

Apache FOP (Formatting Objects Processor)

FOP是Apache XML Graphics项目的一部分，用于将XML文档转换为PDF、PS、PNG等格式。
场景： FOP通常与XSL-FO（可扩展样式语言 - 格式化对象）一起使用，通过XSL-FO模板生成PDF。

二、使用

1、word 转 PDF ，使用aspose工具包

引入jar包：可以通过maven，或者项目本地路径指定

<!--word转PDF
    mvn install:install-file -DgroupId=com.aspose -DartifactId=aspose.slides -Dversion=15.9.0 -Dpackaging=jar -Dfile=e:/test/jar/aspose.slides-15.9.0.jar
    mvn install:install-file -DgroupId=com.aspose -DartifactId=aspose-cells -Dversion=8.5.2 -Dpackaging=jar -Dfile=e:/test/jar/aspose-cells-8.5.2.jar
    mvn install:install-file -DgroupId=com.aspose -DartifactId=aspose-words -Dversion=18.6 -Dpackaging=jar -Dfile=e:/test/jar/aspose-words-18.6-jdk16.jar
-->
<dependency>
    <groupId>com.aspose</groupId>
    <artifactId>aspose.slides</artifactId>
    <version>15.9.0</version>
    <!--
    <type>jar</type>
    <scope>system</scope>
    <systemPath>${project.basedir}/src/main/lib/aspose.slides-15.9.0.jar</systemPath>-->
</dependency>

<dependency>
    <groupId>com.aspose</groupId>
    <artifactId>aspose-cells</artifactId>
    <version>8.5.2</version>
    <!--
    <type>jar</type>
    <scope>system</scope>
    <systemPath>${project.basedir}/src/main/lib/aspose-cells-8.5.2.jar</systemPath>-->
</dependency>

<dependency>
    <groupId>com.aspose</groupId>
    <artifactId>aspose-words</artifactId>
    <version>18.6</version>
    <!--
    <type>jar</type>
    <scope>system</scope>
    <systemPath>${project.basedir}/src/main/lib/aspose-words-18.6-jdk16.jar</systemPath>-->
</dependency>

<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itext-asian</artifactId>
    <version>5.2.0</version>
</dependency>

View Code

linux服务器中文乱码解决，导出的PDF乱码如图：

解决方案
方案1: 环境解决：亲测可行，记得安装后重启项目
安装字库，将win的c:\windows\fonts下的全部文件拷贝到生产服务器字体安装目录下。
#查看linux目前的所有字体
fc-list
#查看Linux目前的所有中文字体
fc-list :lang=zh
#拷贝到linux下的字体目录
mkdir /usr/share/fonts/win
#更新系统中字体缓存 
cd /usr/share/fonts
sudo fc-cache -fv
执行命令让字体生效
source /etc/profile

方案2: 代码解决（推荐）
a.将window中字体放到linux的/usr/shared/fonts/chinese目录
b.在aspose代码中添加:
//linux环境需要配置中文字体：
OsInfo osInfo = SystemUtil.getOsInfo();
if(osInfo.isLinux()){
    FontSettings.getDefaultInstance().setFontsFolder("/usr/share/fonts", true);
}

//新建一个空白pdf文档，byteArrayInputStream为word的输入流
com.aspose.words.Document pdf = new com.aspose.words.Document(byteArrayInputStream);
pdf.save(os, SaveFormat.PDF);
os.flush();

如果你觉得win10中的字体过多，你可以查看word中的所有字体，然后按需导入即可。

示例

/**
 * 
 * @param docxPath E:\test\docment\c.docx
 * @param pdfPath E:\test\docment\a.pdf
 * @return
 */
public static String convertDocToPdf(String docxPath, String pdfPath) {
    if (!getLicense()) { // 验证License 若不验证则转化出的pdf文档会有水印产生
        return "PDF格式转化失败";
    }
    try {
        // 新建一个空白pdf文档
        FileOutputStream os = new FileOutputStream(new File(pdfPath));
        Document doc = new Document(docxPath); // Address是将要被转化的word文档
        doc.save(os, SaveFormat.PDF);// 全面支持DOC, DOCX, OOXML, RTF HTML,
        os.close();
        return pdfPath;
    } catch (Exception e) {
        e.printStackTrace();
    }
    return "PDF格式转化失败";
}

public static boolean getLicense() {
    boolean result = false;

    try {
        // license.xml应放在..\WebRoot\WEB-INF\classes路径下
//            InputStream is = PdfUtil.class.getClassLoader().getResourceAsStream("license/license.xml");
        InputStream is = PdfUtil.class.getResourceAsStream("license/license.xml");
        License aposeLic = new License();
        aposeLic.setLicense(is);
        result = true;

    } catch (Exception e) {
        e.printStackTrace();
    }
    return result;
}

2、word 转 PDF ，使用pdf-gae工具包

缺点：会与itext.jar产生冲突，如果你的项目中已有部分业务使用了itext.jar里面的内容，则推荐上一种方式。

导入jar包：

<!--convert doc to pdf-->
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>4.1.2</version>
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>4.1.2</version>
</dependency>
<!--此种转换方式会与itext冲突：尝试多次后无法解决-->
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.poi.xwpf.converter.pdf-gae</artifactId>
    <version>2.0.2</version>
    <exclusions>
        <exclusion>
            <artifactId>org.apache.poi</artifactId>
            <groupId>poi-ooxml</groupId>
        </exclusion>
    </exclusions>
</dependency>

View Code

示例：

import fr.opensagres.poi.xwpf.converter.pdf.PdfConverter;
import fr.opensagres.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;/**
 * 参考：https://www.cnblogs.com/h-w-b/p/17352151.html
 */
public class WordUtils {
    /**
     * 方式一：会与itext.jar产生冲突，如果你的项目中已有部分业务使用了itext.jar里面的内容，则推荐其他方式。
     */
    public static void convertDocxToPdf() {
        String pdfFilePath = "E:\\test\\docment\\a.pdf";
        String docxFilePath = "E:\\test\\docment\\c.docx";
        String  targetPaht = "E:\\test\\docment\\" + new Date().getTime()+".pdf";

        try{
            InputStream inputStream = new FileInputStream(docxFilePath);
            FileOutputStream outputStream = new FileOutputStream(targetPaht);
            XWPFDocument xwpfDocument = new XWPFDocument(inputStream);
            PdfOptions pdfOptions = PdfOptions.create();

            PdfConverter.getInstance().convert(xwpfDocument, outputStream, pdfOptions);

            outputStream.flush();
            outputStream.close();
            System.out.println("===========");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

3、html模版生成PDF，使用freemarker

jar包引入：freemarker可以导入引入依赖，但是既然是springboot项目，那就引入freemarker启动器的方式

<dependencies>  
    <!-- Spring Boot Freemarker dependency -->  
    <dependency>  
        <groupId>org.springframework.boot</groupId>  
        <artifactId>spring-boot-starter-freemarker</artifactId>  
    </dependency>  
    <!-- Apache PDFBox dependency for PDF generation -->  
    <dependency>  
        <groupId>org.apache.pdfbox</groupId>  
        <artifactId>pdfbox</artifactId>  
        <version>2.0.24</version>  
    </dependency>  
</dependencies>

View Code

示例：

//直接注入freemarker
@Autowired
private Configuration freemarkerConfig;
public void getPdf(){
    // 创建数据模型
    Map<String, Object> dataModel = new HashMap<>();
    dataModel.put("entity", new User());
　　//模版路径默认读取的是：resources/templates/a.ftl
    Template template = freemarkerConfig.getTemplate("a.ftl");
    StringWriter writer = new StringWriter();
    // 使用模板填充数据并将结果写入 Writer
    template.process(dataModel, writer);
    // 生成PDF: 这里获取到的PDF流
    byte[] pdfBytes = getPdfFromHtml(writer.toString());
}

//html文件生产PDF文档
public byte[] getPdfFromHtml(String htmlContent) {
    try (ByteArrayOutputStream os = new ByteArrayOutputStream()) {
        ITextRenderer renderer = new ITextRenderer();
        // 添加中文字体支持
        renderer.getFontResolver().addFont("fonts/SimSun.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
        renderer.setDocumentFromString(htmlContent);
        renderer.layout();
        renderer.createPDF(os, true);
        return os.toByteArray();
    }
}

a.ftl模版示例：

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
    <title>21x10 Table</title>
    <style>
        body {
            /* 使用宋体或其他支持中文的字体 */
            font-family: 'SimSun'; 
        }
        table {
            border-collapse: collapse;
            width: 100%;
        }

        table,
        th,
        td {
            border: 1px solid black;
        }

        th,
        td {
            padding: 10px;
            text-align: center;
        }
    </style>
</head>

<body>
<table>
    <thead>
    <tr>
        <th colspan="6">成绩表</th>
    </tr>
    </thead>
    <tbody>  
    <tr>
        <td>姓名</td>
        <td colspan="2">${entity.username?default('')}</td>
        <td>年龄</td>
        <td colspan="2">${entity.age?default('')}</td>
    </tr>    
    <tr>
        <td>婚否</td>
        <td colspan="2"><input type="checkbox"/>是</td>
        <td colspan="3"><input type="checkbox"/>否</td>
    </tr>
    <tr>
        <td colspan="6">成绩列表</td>
    </tr>
    <tr>
        <td colspan="2">课程</td>
        <td>语文</td>
        <td>数据</td>
        <td>英语</td>
        <td>地理</td>
    </tr>

    <#--遍历-->
    <#if entity.grades??>
        <#list entity.grades as item>
            <tr>
                <#if item.rowspanNum != 0>
                    <td rowspan="${item.rowspanNum}">${item.year}</td>
                <#else>

                </#if>
                <td>${item.medTypeDesc?default('')}</td>
                <td><input type="checkbox" <#if item.courseCheckbox?seq_contains('1')>checked="checked"</#if> /></td>
                <td><input type="checkbox" <#if item.courseCheckbox?seq_contains('2')>checked="checked"</#if> /></td>
                <td><input type="checkbox" <#if item.courseCheckbox?seq_contains('3')>checked="checked"</#if> /></td>
                <td><input type="checkbox" <#if item.courseCheckbox?seq_contains('4')>checked="checked"</#if> /></td>
            </tr>
        </#list>
    </#if>
    </tbody>
</table>

</body>

</html>

View Code

4、pdf插入word

示例：

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.apache.poi.util.Units;
import org.apache.poi.xwpf.usermodel.Document;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;

public static String convertWordToPdf(OutputStream os) {
    try {
        // 读取Word文档
        XWPFDocument doc = new XWPFDocument(WordUtils.class.getResourceAsStream("a.docx"));
        // 读取PDF文件
        PDDocument pdfDocument = PDDocument.load(new FileInputStream("b.pdf"));
        PDFRenderer pdfRenderer = new PDFRenderer(pdfDocument);

        // 将PDF的每一页转换为图像并插入到Word文档
        for (int pageNumber = 0; pageNumber < pdfDocument.getNumberOfPages(); ++pageNumber) {
            //第二个参数 300 表示渲染的分辨率（每英寸的像素数）
            BufferedImage bim = pdfRenderer.renderImageWithDPI(pageNumber, 300);

            // 创建段落
            XWPFParagraph paragraph = doc.createParagraph();
            XWPFRun run = paragraph.createRun();
            byte[] imageBytes = getPackagePart(bim);
            try (ByteArrayInputStream inputStream = new ByteArrayInputStream(imageBytes)) {
                run.addPicture(inputStream, Document.PICTURE_TYPE_PNG, "image.png", Units.toEMU(400), Units.toEMU(500));
            } catch (Exception e) {
                e.printStackTrace();
            }
        }

        // 保存修改后的Word文档
        doc.write(os);
        os.flush();
        return "导出成功";
    } catch (Exception e) {
        e.printStackTrace();
    }
}

/**
 * 将 BufferedImage 对象保存为字节数组。这是因为在插入图像到 Word 文档时，需要提供图像的字节数组。
 */
private static byte[] getPackagePart(BufferedImage bim)  {
    // 将BufferedImage保存为字节数组
    File tempFile = File.createTempFile(UUID.randomUUID().toString(), ".png");
    // BufferedImage 对象保存为 PNG 格式的图像文件。
    ImageIO.write(bim, "png", tempFile);

    // 读取临时文件的数据
    try (FileInputStream fis = new FileInputStream(tempFile)) {
        byte[] bytes = new byte[fis.available()];
        fis.read(bytes);
        return bytes;
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        // 关闭临时文件的输入流
        tempFile.deleteOnExit();
    }
}

posted @ 2023-11-17 11:42 一帘幽梦&nn 阅读(718) 评论(0) 收藏举报

刷新页面返回顶部

一帘幽梦&nn

PDF操作

一、概述

iText

Apache PDFBox

Freemarker和Velocity

Flying Saucer（XHTMLRenderer）

Thymeleaf

JasperReports

Apache FOP (Formatting Objects Processor)

二、使用

1、word 转 PDF ，使用aspose工具包

2、word 转 PDF ，使用pdf-gae工具包

3、html模版生成PDF，使用freemarker

4、pdf插入word

公告