zno2

使用 apache pdfbox 去除水印

需求

学习cobol过程中,找了一本电子书,但是有水印。

WPS 可以擦除,但是需要开通会员。

能不能用java程序去除水印呢?

实现

先查阅一些资料,开拓视野。

第一步:安装 org.apache.pdfbox:pdfbox-app:3.0.2 ,这是一个可执行jar,执行后可弹出Swing图形用户界面,可导入pdf文件后可查看其内部结构

java -jar pdfbox-app-3.0.2.jar

给出帮助信息

Usage: pdfbox [COMMAND] [OPTIONS]
Commands:
  debug          Analyzes and inspects the internal structure of a PDF document
  decrypt        Decrypts a PDF document
  encrypt        Encrypts a PDF document
  decode         Writes a PDF document with all streams decoded
  export:images  Extracts the images from a PDF document
  export:xmp     Extracts the xmp stream from a PDF document
  export:text    Extracts the text from a PDF document
  export:fdf     Exports AcroForm form data to FDF
  export:xfdf    Exports AcroForm form data to XFDF
  import:fdf     Imports AcroForm form data from FDF
  import:xfdf    Imports AcroForm form data from XFDF
  overlay        Adds an overlay to a PDF document
  print          Prints a PDF document
  render         Converts a PDF document to image(s)
  merge          Merges multiple PDF documents into one
  split          Splits a PDF document into number of new documents
  fromimage      Creates a PDF document from images
  fromtext       Creates a PDF document from text
  version        Gets the version of PDFBox
  help           Display help information about the specified command.
See 'pdfbox help <command>' to read about a specific subcommand
java -jar pdfbox-app-3.0.2.jar debug

 

 File -> Open... -> <choose your pdf file> 

查看Page:1结构,发现 Im[pageNum]代表每一页的书籍内容,是图片;Xi[pageNum-1]代表每一页的水印,是文本;这两个组成了页面内容

整体是树状结构:Document - Page - Resource -  XObject - Xi

 

写程序:

引入依赖

        <dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox</artifactId>
            <version>3.0.2</version>
        </dependency>

 

    @Test
    public void testaa() throws IOException {
        PDDocument doc = Loader.loadPDF(new File("D:\\202404 cobol学习\\[精通COBOL大型机商业编程技术详解].马千里.修订版.pdf"));
        PDPageTree pages = doc.getPages();
        int i = 0;
        for (PDPage page : pages) {
            PDResources resources = page.getResources();
            COSDictionary dic = resources.getCOSObject().getCOSDictionary(COSName.XOBJECT);
            dic.removeItem(COSName.getPDFName("Xi"+i++));
        }
        doc.save(new File("D:\\202404 cobol学习\\无水印.pdf"));
        // The #close() method must be called once the document is no longer needed.
        doc.close();
    }

 

 

执行后全部水印被清除

 

posted on 2024-04-22 09:21  zno2  阅读(633)  评论(0编辑  收藏  举报

导航