PHP在linux读取word文档
几天帮朋友解决一个技术问题,在Linux下,将word文档中的内容读取,然后使用正则匹配,拼成sql入库
查阅了外文资料和google之后,步骤如下:
#wget http://www.winfield.demon.nl/linux/antiword-0.37.tar.gz
#tar zxvf antiword-0.37.tar.gz
#cd antiword-0.37
#make
#make install
antiword
cp /root/bin/*antiword /usr/local/bin/
mkdir /usr/share/antiword
cp -R /root/.antiword/* /usr/share/antiword/
chmod 777 /usr/local/bin/*antiword
chmod 755 /usr/share/antiword/*
安装完成之后,如果要在web上查看的话,需要使用root执行 make global_install
1 2 3 4 5 6 7 8 9 10 11 12 | <?php header( "Content-type: text/html; charset=utf-8" ); $filename = 'test.doc' ; # $content = shell_exec( 'antiword ' . $filename ); $content = shell_exec( 'antiword -mUTF-8 ' . $filename ); echo '<pre>' ; print_r ( $content ); echo '</pre>' ; |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | #coding=utf-8 #usage python <script_name> <docFilePath> #pip install python-docx [安装一下扩展库] import sys import os from docx import Document #获取当前脚本得名称 argv0_list = sys.argv[ 0 ].split( "\\" ); script_name = argv0_list[ len (argv0_list) - 1 ]; usage = "\n Usage python <" + script_name + "> <docFilePath>" if len (sys.argv) ! = 2 : print "Warning:\n docx file is empty" + usage sys.exit() docx_path = sys.argv[ 1 ] if not os.path.exists(docx_path): print "Warning:\n docx file is not exist" + usage sys.exit() #打开文档 document = Document(docx_path) #读取每段资料 l = [ paragraph.text.encode( 'utf8' ) for paragraph in document.paragraphs]; #输出并观察结果,也可以通过其他手段处理文本即可 for i in l: print i #读取表格材料,并输出结果 tables = [table for table in document.tables]; for table in tables: for row in table.rows: for cell in row.cells: print cell.text.encode( 'utf8' ), '\t' , |
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步