下载识别引擎
网上可以找到 MODI 的安装包,但用里面的脚本安装后,可能配置信息不全,只能识别法文。
在马健先生的原创空间里,也有文字识别(OCR)引擎可下载。详细安装方法,请参考下载空间里的文档。
关于微软 Office 文档处理,可参考马健先生的 MODI 说明链接:https://www.cnblogs.com/stronghorse/p/4913447.html
配置注册表
如果出现只能识别法文的现象,可以将下面的代码保存为一个reg文件,导入注册表,在 PDF 补丁丁里就能找到简繁中文和英文三种语言了。
Windows Registry Editor Version 5.00 [HKEY_CLASSES_ROOT\Installer\Components\61BA386016BD0C340BBEAC273D84FD5F] "1028"=hex(7):76,00,55,00,70,00,41,00,56,00,4f,00,65,00,64,00,40,00,24,00,21,\ 00,21,00,21,00,21,00,21,00,4d,00,4b,00,4b,00,53,00,6b,00,4f,00,43,00,52,00,\ 5f,00,31,00,30,00,32,00,38,00,3c,00,00,00,00,00 "2052"=hex(7):76,00,55,00,70,00,41,00,56,00,53,00,2e,00,7d,00,58,00,25,00,21,\ 00,21,00,21,00,21,00,21,00,4d,00,4b,00,4b,00,53,00,6b,00,4f,00,43,00,52,00,\ 5f,00,32,00,30,00,35,00,32,00,3c,00,00,00,00,00 "1033"=hex(7):76,00,55,00,70,00,41,00,56,00,54,00,28,00,38,00,41,00,24,00,21,\ 00,21,00,21,00,21,00,21,00,4d,00,4b,00,4b,00,53,00,6b,00,4f,00,43,00,52,00,\ 5f,00,31,00,30,00,33,00,33,00,3e,00,26,00,61,00,45,00,4d,00,61,00,65,00,2c,\ 00,37,00,71,00,39,00,2a,00,44,00,58,00,64,00,55,00,40,00,45,00,50,00,69,00,\ 3d,00,00,00,00,00
要使用 MODI 识别文本,还必须以管理员身份来启动 PDF 补丁丁,否则还是会遇到调用失败的问题。