java poi之XWPFDocument读取word内容并创建新的word(获取表格所有图片)
Poi的Word文档结构介绍
1、poi之word文档结构介绍之正文段落 一个文档包含多个段落,一个段落包含多个Runs,一个Runs包含多个Run,Run是文档的最小单元 获取所有段落:List paragraphs = word.getParagraphs(); 获取一个段落中的所有Runs:List xwpfRuns = xwpfParagraph.getRuns(); 获取一个Runs中的一个Run:XWPFRun run = xwpfRuns.get(index);
2、poi之word文档结构介绍之正文表格 一个文档包含多个表格,一个表格包含多行,一行包含多列(格),每一格的内容相当于一个完整的文档 获取所有表格:List xwpfTables = doc.getTables(); 获取一个表格中的所有行:List xwpfTableRows = xwpfTable.getRows(); 获取一行中的所有列:List xwpfTableCells = xwpfTableRow.getTableCells(); 获取一格里的内容:List paragraphs = xwpfTableCell.getParagraphs(); 之后和正文段落一样
注:
表格的一格相当于一个完整的docx文档,只是没有页眉和页脚。里面可以有表格,使用xwpfTableCell.getTables()获取,and so on
在poi文档中段落和表格是完全分开的,如果在两个段落中有一个表格,在poi中是没办法确定表格在段落中间的。(当然除非你本来知道了,这句是废话)。只有文档的格式固定,才能正确的得到文档的结构
3、poi之word文档结构介绍之页眉: 一个文档可以有多个页眉(不知道怎么会有多个页眉。。。),页眉里面可以包含段落和表格 获取文档的页眉:List headerList = doc.getHeaderList(); 获取页眉里的所有段落:List paras = header.getParagraphs(); 获取页眉里的所有表格:List tables = header.getTables(); 之后就一样了
4、poi之word文档结构介绍之页脚: 页脚和页眉基本类似,可以获取表示页数的角标
1 2 3 4 5 6 7 8 9 10 11 12 | IBodyElement -------------------迭代器(段落和表格) XWPFComment -------------------评论(个人理解应该是批注) XWPFSDT XWPFFooter -------------------页脚 XWPFFootnotes -------------------脚注 XWPFHeader -------------------页眉 XWPFHyperlink -------------------超链接 XWPFNumbering -------------------编号 XWPFParagraph -------------------段落 XWPFPictureData -------------------图片 XWPFStyles -------------------样式(设置多级标题的时候用) XWPFTable -------------------表格 |
pom 依赖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | <dependencies> <!--解析doc文档HWPFDocument--> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-scratchpad</artifactId> <version>4.1.2</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>4.1.2</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>4.1.2</version> </dependency> <dependency> <groupId>springframework</groupId> <artifactId>spring-core</artifactId> <version>1.2.6</version> </dependency> </dependencies> |
maven依赖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | import org.apache.poi.xwpf.usermodel.*; import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTDocument1; import java.io.*; import java.util.List; public class poi3 { public static void main(String[] args) throws IOException { // 获取文件输入流 FileInputStream fileInputStream = getFileInputStream( "666.docx" ); dealDocx(fileInputStream, "副本.docx" ); } private static FileInputStream getFileInputStream(String name) throws FileNotFoundException { String dir = poi3. class .getResource( "" ).getPath() + name; FileInputStream fileInputStream = new FileInputStream(dir); return fileInputStream; } private static void dealDocx(InputStream inputStream, String newFileName) throws IOException { // 创建输出文件 File file = new File(poi3. class .getResource( "" ).getPath() + newFileName); // 获取文件输出流 FileOutputStream fileOutputStream = new FileOutputStream(file); // 创建操作word的对象 XWPFDocument wordInput = new XWPFDocument(inputStream); XWPFDocument wordOutput = new XWPFDocument(); // 获取所有段落 List<XWPFParagraph> xwpfParagraphs = wordInput.getParagraphs(); // 迭代每一个段落 for (XWPFParagraph xwpfParagraph : xwpfParagraphs) { // 原文档有多少个段落 我就创建多少个 XWPFParagraph wordOutputParagraph = wordOutput.createParagraph(); // 获取当前段落的所有run List<XWPFRun> runs = xwpfParagraph.getRuns(); for (XWPFRun run : runs) { XWPFRun wordOutputParagraphRun = wordOutputParagraph.createRun(); // 赋值 //wordOutputParagraphRun.setText("哈哈哈哈~我修改过了"); // 添加回车 硬回车 //wordOutputParagraphRun.addCarriageReturn(); //wordOutputParagraphRun.addBreak(); // 软回车 wordOutputParagraphRun.setText(run.getText(run.getCharacterSpacing())); } } // 获取所有表格 List<XWPFTable> xwpfTables = wordInput.getTables(); for (XWPFTable xwpfTable : xwpfTables) { XWPFTable wordOutputTable = wordOutput.createTable(); // 获取一个表格中的所有行 List<XWPFTableRow> xwpfTableRows = xwpfTable.getRows(); System. out .println( "xwpfTableRows个数" +xwpfTableRows.size()); for (XWPFTableRow xwpfTableRow : xwpfTableRows) { XWPFTableRow wordOutputTableRow = wordOutputTable.createRow(); // 获取一行的所有列 List<XWPFTableCell> xwpfTableCell = xwpfTableRow.getTableCells(); System. out .println( "xwpfTableCell个数" +xwpfTableCell.size()); int index = 0; for (XWPFTableCell tableCell : xwpfTableCell) { index++; XWPFTableCell wordOutputTableRowCell = wordOutputTableRow.createCell(); // 获取单个列 //wordOutputTableRowCell.setText("哈哈哈哈~我修改过了"); System. out .println(tableCell.getText()); wordOutputTableRowCell.setText(tableCell.getText()); System. out .println( "index:" +index); } wordOutputTable.removeRow(0); } //wordOutputTable.removeBorders(); 虚线边框 } CTDocument1 document = wordInput.getDocument(); System. out .println(); wordOutput.write(fileOutputStream); wordInput.close(); wordOutput.close(); inputStream.close(); fileOutputStream.close(); } } |
获取文档中的所有图片代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | private void getPictureImage(String filepath) throws IOException { XWPFDocument doc = new XWPFDocument( new FileInputStream(filepath)); List<ImageInfo> result = Lists.newArrayList(); Pattern pattern = Pattern.compile(PICTURE_IMG_PATTERN); for (XWPFParagraph paragraph : doc.getParagraphs()) { for (XWPFRun run : paragraph.getRuns()) { System. out .println( "............................." + paragraph.getRuns()); for (XWPFPicture embeddedPicture : run.getEmbeddedPictures()) { System. out .println( "............................." + embeddedPicture.getDescription()); Matcher matcher = pattern.matcher(embeddedPicture.getDescription()); while (matcher.find()) { String source = matcher. group (); ImageInfo imageInfo = new ImageInfo(); imageInfo.setSource(source); imageInfo.setImageField(getImageField(imageInfo.getSource())); imageInfo.setFiled(getField(imageInfo.getSource())); result.add(imageInfo); } } } } List<XWPFTable> xwpfTables = doc.getTables(); for (XWPFTable xwpfTable : xwpfTables) { List<XWPFTableRow> xwpfTableRows = xwpfTable.getRows(); for (XWPFTableRow xwpfTableRow : xwpfTableRows) { List<XWPFTableCell> xwpfTableCells = xwpfTableRow.getTableCells(); for (XWPFTableCell xwpfTableCell : xwpfTableCells) { for (XWPFParagraph paragraph : xwpfTableCell.getParagraphs()) { for (XWPFRun run : paragraph.getRuns()) { for (XWPFPicture embeddedPicture : run.getEmbeddedPictures()) { Matcher matcher = pattern.matcher(embeddedPicture.getDescription()); while (matcher.find()) { String source = matcher. group (); ImageInfo imageInfo = new ImageInfo(); imageInfo.setSource(source); imageInfo.setImageField(getImageField(imageInfo.getSource())); imageInfo.setFiled(getField(imageInfo.getSource())); result.add(imageInfo); } } } } } } } } |
版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
原文链接:https://blog.csdn.net/weixin_43702146/article/details/116159448
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 微软正式发布.NET 10 Preview 1:开启下一代开发框架新篇章
· 没有源码,如何修改代码逻辑?
· NetPad:一个.NET开源、跨平台的C#编辑器
· PowerShell开发游戏 · 打蜜蜂
· 凌晨三点救火实录:Java内存泄漏的七个神坑,你至少踩过三个!