关于openOffice对于word的转换及遇到的问题

一：需求详情：

　　公司需要存储合同文件，用户上传word文档的合同，通过openOffice去把word转换为pdf、再把pdf转换为图片格式，并分别存储。因为openOffice的转换需要耗费挺大的内存，所以设计为task任务，凌晨自动转换。

　　记录本次需求完成的时候遇到的问题。

二：过程

　　1：本地环境编码（windows）

　　第一步：因为是本地环境的编码而且是Windows环境，所以从安装openOffice开始，到启动服务并没有遇到难题。

　　第二步：转换所需要的工具包；

 1 <dependency>
 2     <groupId>commons-cli</groupId>
 3     <artifactId>commons-cli</artifactId>
 4     <version>1.2</version>
 5 </dependency>
 6 
 7 <dependency>
 8     <groupId>commons-io</groupId>
 9     <artifactId>commons-io</artifactId>
10     <version>1.4</version>
11 </dependency>
12 
13 <dependency>
14     <groupId>org.openoffice</groupId>
15     <artifactId>juh</artifactId>
16     <version>3.0.1</version>
17 </dependency>
18 
19 <dependency>
20     <groupId>org.openoffice</groupId>
21     <artifactId>jurt</artifactId>
22     <version>3.0.1</version>
23 </dependency>
24 
25 <dependency>
26     <groupId>org.openoffice</groupId>
27     <artifactId>ridl</artifactId>
28     <version>3.0.1</version>
29 </dependency>
30 
31 <dependency>
32     <groupId>org.slf4j</groupId>
33     <artifactId>slf4j-api</artifactId>
34 </dependency>
35 
36 <dependency>
37     <groupId>org.slf4j</groupId>
38     <artifactId>slf4j-jdk14</artifactId>
39     <scope>test</scope>
40 </dependency>
41 
42 <dependency>
43     <groupId>org.openoffice</groupId>
44     <artifactId>unoil</artifactId>
45     <version>3.0.1</version>
46 </dependency>
47 
48 <dependency>
49     <groupId>com.thoughtworks.xstream</groupId>
50     <artifactId>xstream</artifactId>
51     <version>1.3.1</version>
52 </dependency>
53 
54 <dependency>
55     <groupId>org.apache.pdfbox</groupId>
56     <artifactId>fontbox</artifactId>
57     <version>2.0.8</version>
58 </dependency>
59 
60 <dependency>
61     <groupId>org.apache.pdfbox</groupId>
62     <artifactId>pdfbox</artifactId>
63     <version>2.0.8</version>
64 </dependency>

　　问题1：在这里遇到了第一个问题，就是在maven的中央仓库找不到关键的依赖jar包的问题。

　　jodconverter-cli 这个jar包中央仓库找不到jar包依赖，jodconverter 版本才到2.2.1（这个版本之前的不能支持docx格式转换，2.2.2及以后才开始支持。）

　　然后和大牛商量，加入到公司内网自己的maven仓库。

　　第三步：工具类

 1 /**
 2  * @author GH
 3  *    输入文件
 4  *    输出文件
 5  */
 6 public class WordToPdf {//word转pdf
 7     public static void docToPdf(File inputFile, File outputFile){
 8         OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);
 9         try{
10             connection.connect();
11             DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
12             converter.convert(inputFile, outputFile);
13         }catch(ConnectException cex){
14             cex.printStackTrace();
15         }finally{
16             if(connection!=null){
17                 connection.disconnect();
18                 connection = null;
19             }
20         }
21     }
22 }

 1 /**
 2  * @author GH
 3  *    参数1：要装换的pdf位置
 4  *    参数2：转换后的图片存放位置
 5  *    参数3：中间要拼接的名字
 6  *    return：转换后的img名字集合
 7  */
 8 public class PdfToImage {//pdf转img
 9     public static List<String> pdfToImagePath(String srcFile,String contractFromSrc,String name){
10         List<String> list = new ArrayList<>();
11         String imagePath;
12         File file = new File(srcFile);
13         try {
14             File f = new File(contractFromSrc);
15             if(!f.exists()){
16                 f.mkdir();
17             }
18             PDDocument doc = PDDocument.load(file);
19             PDFRenderer renderer = new PDFRenderer(doc);
20             int pageCount = doc.getNumberOfPages();
21             for(int i=0; i<pageCount; i++){
22                 // 方式1,第二个参数是设置缩放比(即像素)
23                 // BufferedImage image = renderer.renderImageWithDPI(i, 296);
24                 // 方式2,第二个参数是设置缩放比(即像素)
25                 BufferedImage image = renderer.renderImage(i, 2f);  //第二个参数越大生成图片分辨率越高，转换时间也就越长
26                 imagePath = contractFromSrc+name+"-"+i +".jpg";
27                 ImageIO.write(image, "PNG", new File(imagePath));
28                 list.add(name+"-"+i +".jpg");
29             }
30             doc.close();
31         } catch (IOException e) {
32             e.printStackTrace();
33         }
34         return list;
35     }
36 }

　　第四步：编码

　　首先从数据库读取没有转换过的集合，循环下载oss对象存储文件到指定临时文件夹。

　　通过工具类转换下载的word为pdf，录入数据pdf记录，上传oss对象pdf图片。

　　通过工具类转换得到的pdf图片，录入数据路图片记录，上传转换得到的img图片。

　　try catch捕捉异常，有异常就回滚数据库，删除oss对象上传的文件。

　　修改word的转换状态为已转换。

　　问题2：因为到最后测试环境和生产环境都是Linux系统的，因为涉及到文件的操作，但是Linux和Windows的文件路径是不一样的，例如：Windows文件路径为（C:\tmp\test.txt）Linux则为（/tmp/test.txt）

　　因此采用这种方式

1 　　public  final static String Convert_Tmp_Url="C:"+File.separator+"temp"+File.separator+"contractToImg"+File.separator;//进行word——img转换的时候的暂时存放路径 window
2     public  final static String Convert_Tmp_Url2=File.separator+"tmp"+File.separator+"contractToImg"+File.separator;//进行word——img转换的时候的暂时存放路径 linux

　　File.separator 与系统有关的默认名称分隔符，为了方便，它被表示为一个字符串在Linux此字段的值为 '/' Windows为'\'

　　第五步：本地测试，没有问题。

　　2：测试环境测试（Linux）

　　问题3：在Linux环境下word转换word中文出现乱码空白，导致的原因是Linux缺少中文字体编码。

　　解决方法：

　　步骤1：创建路径。

　　在centos的/usr/java/jdk1.8.0_91/jre/lib/fonts下新建路径：fallback。

　　步骤2：上传字体。

　　将字体：simhei.ttf 黑体、simsun.ttc 宋体（windows下通过everything找下）上传至/usr/java/jdk1.8.0_91/jre/lib/fonts/fallback路径下。

　　步骤3：查看系统字体文件路径。

　　查看方案:

[root@80ec6 fallback]# cat /etc/fonts/fonts.conf
<dir>/usr/share/fonts</dir>
<dir>/usr/share/X11/fonts/Type1</dir> <dir>/usr/share/X11/fonts/TTF</dir> <dir>/usr/local/share/fonts</dir>
<dir>~/.fonts</dir>

　　步骤4：字体拷贝。

　　将 /usr/java/jdk1.8.0_91/jre/lib/fonts的全部内容，拷贝到步骤3查看的路径下，我的字体路径为：/usr/share/fonts。

　　步骤5：更新缓存

　　执行命令：fc-cache

　　步骤6：kill掉openoffice进程。

　　[root@80ec6 fonts]# ps -ef | grep openoffice

　　root 3045 3031 0 06:19 pts/1 00:00:03 /opt/openoffice4/program/soffice.bin -headless -accept=socket,host=127.0.0.1,port=8100;urp; -nofirststartwizard

　　执行kill：kill -9 3045

　　步骤7：重启后台运行openoffice。

　[root@a3cf78780ec6 openoffice4]# soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard &

　　3：测试环境和生产环境内核不一样，安装的安装包不一样。

　　测试环境的安装的是deb文件，使用 dpkg命令安装所有的deb文件，启动服务就能使用。

　　生产环境的是dpkg命令找不到。改换安装prm文件，执行安装之后，竟然启动不了，查找原因之后尽然是没有安装完，RPMS目录下有desktop-integration文件夹，进入到desktop-integration目录，里面有四个rpm　　文件，选择相应的安装即可，这里我选择的是redhat版本。
　　执行 rpm -ivh　openoffice4.1.5-redhat-menus-4.1.5-9789.noarch.rpm

posted @ 2018-09-10 14:32 曾将阅读(4745) 评论(0) 收藏举报

刷新页面返回顶部

曾将

不知所起，一往情深。