Java 读取Word文档中的文本内容

1、添加依赖关系

        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>3.8</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-scratchpad</artifactId>
            <version>3.8</version>
        </dependency>

2、读取word内容代码

 String buffer = "";
        try {
            if (path.endsWith(".doc")) {
                FileInputStream is = new FileInputStream(path);
                WordExtractor ex = new WordExtractor(is);
                buffer = ex.getText();
                is.close();
            } else if (path.endsWith("docx")) {
                OPCPackage opcPackage = POIXMLDocument.openPackage(path);
                POIXMLTextExtractor extractor = new XWPFWordExtractor(opcPackage);
                buffer = extractor.getText();
                opcPackage.close();
            } else {
                return AjaxResult.error("文件不是word文件");
            }
        } catch (Exception e) {
            //e.printStackTrace();
            return AjaxResult.error("读取word文件失败"+e.getMessage());
        }

 

posted @ 2021-10-09 09:42  代码沉思者  阅读(4115)  评论(0编辑  收藏  举报