Hunspell介绍及试用
1、简介
Hunspell是一个为拥有多态和复杂组合词的语言所设计的拼写检查器,原本为匈牙利语设计。
Hunspell是一个自由软件,在GPL、LGPL和MPL三许可证下发行。
Hunspell对主要平台和编程语言都有接口和封装。Hunspell基于MySpell,并且与MySpell词典后端兼容。MySpell使用单字节字符编码,而Hunspell则可以使用Unicode UTF-8编码的词典。
2、以下应用程序使用Hunspell作为拼写检查器:
Mac OS X10.6 以及之后版本
Eclipse,使用Hunspell4Eclipse
Google Chrome,Google开发的一个网页浏览器
Evernote,笔记软件
LibreOffice和OpenOffice.org,开源办公组件
Mozilla Firefox和Thunderbird以及SeaMonkey
Opera,一个跨平台的网页浏览器
Scribus,桌面出版应用
Vim,一个文本编辑器
WPS Office,国产办公组件
3、使用docker镜像测试Hunspell的功能:
3.1查看可用字典
[root@host-10-0-251-159 hunspell]# docker run --rm tmaier/hunspell -D SEARCH PATH: .::/usr/share/hunspell:/usr/share/myspell:/usr/share/myspell/dicts:/Library/Spelling:/root/.openoffice.org/3/user/wordbook:/root/.openoffice.org2/user/wordbook:/root/.openoffice.org2.0/user/w/lib/openoffice.org/basis3.0/share/dict/ooo:/opt/openoffice.org2.4/share/dict/ooo:/usr/lib/openoffice.org2.4/share/dict/ooo:/opt/openoffice.org2.3/share/dict/ooo:/usr/lib/openoffice.org2.3/shhare/dict/ooo:/opt/openoffice.org2.1/share/dict/ooo:/usr/lib/openoffice.org2.1/share/dict/ooo:/opt/openoffice.org2.0/share/dict/ooo:/usr/lib/openoffice.org2.0/share/dict/ooo AVAILABLE DICTIONARIES (path is not mandatory for -d option): /usr/share/hunspell/en_CA /usr/share/hunspell/de_DE_comb /usr/share/hunspell/en_ZA /usr/share/hunspell/en_US /usr/share/hunspell/en_GB /usr/share/hunspell/en_AU /usr/share/hunspell/de_CH /usr/share/hunspell/de_DE_neu /usr/share/hunspell/en_NZ /usr/share/hunspell/de_AT /usr/share/hunspell/default LOADED DICTIONARY: /usr/share/hunspell/default.aff /usr/share/hunspell/default.dic Hunspell 1.6.2
3.2查看帮助信息
[root@host-10-0-251-159 hunspell]# docker run --rm -v $(pwd):/workdir tmaier/hunspell -u3 -i utf-8 -d de_DE_neu,en_US,de_CH -p words -h Usage: hunspell [OPTION]... [FILE]... Check spelling of each FILE. Without FILE, check standard input. -1 check only first field in lines (delimiter = tabulator) -a Ispell's pipe interface --check-url check URLs, e-mail addresses and directory paths --check-apostrophe check Unicode typographic apostrophe -d d[,d2,...] use d (d2 etc.) dictionaries -D show available dictionaries -G print only correct words or lines -h, --help display this help and exit -H HTML input file format -i enc input encoding -l print misspelled words(只打印错误的单词) -L print lines with misspelled words(打印错误单词所在行) -m analyze the words of the input text -n nroff/troff input file format -O OpenDocument (ODF or Flat ODF) input file format -p dict set dict custom dictionary -r warn of the potential mistakes (rare words) -P password set password for encrypted dictionaries -s stem the words of the input text -S suffix words of the input text -t TeX/LaTeX input file format -v, --version print version number -vv print Ispell compatible version number -w print misspelled words (= lines) from one word/line input. -X XML input file format Example: hunspell -d en_US file.txt # interactive spelling hunspell -i utf-8 file.txt # check UTF-8 encoded file hunspell -l *.odt # print misspelled words of ODF files # Quick fix of ODF documents by personal dictionary creation # 1 Make a reduced list from misspelled and unknown words: hunspell -l *.odt | sort | uniq >words # 2 Delete misspelled words of the file by a text editor. # 3 Use this personal dictionary to fix the deleted words: hunspell -p words *.odt Bug reports: http://hunspell.github.io/
3.3检查某个文档的拼写(显示错误词所在行数及建议更改)原文:test1.TXT(链接:https://pan.baidu.com/s/17JRmtnebLblVsMG05CIm-w 密码:l3q9)
[root@host-10-0-251-159 hunspell]# docker run --rm -v $(pwd):/workdir tmaier/hunspell -u3 -i utf-8 -d de_DE_neu,en_US,de_CH -p words test1.TXT test1.TXT:7: Locate: rans | Try: rand test1.TXT:15: Locate: wew | Try: woo test1.TXT:23: Locate: Sevenn | Try: Severn test1.TXT:27: Locate: cannt | Try: canny test1.TXT:203: Locate: Hmm | Try: Mm test1.TXT:211: Locate: Lele | Try: Lee test1.TXT:215: Locate: Lele | Try: Lee test1.TXT:243: Locate: Lele | Try: Lee test1.TXT:247: Locate: Lele | Try: Lee test1.TXT:284: Locate: Hmm | Try: Mm test1.TXT:292: Locate: Hmm | Try: Mm test1.TXT:468: Locate: ve | Try: be test1.TXT:500: Locate: ve | Try: be test1.TXT:516: Locate: ve | Try: be test1.TXT:564: Locate: Hmm | Try: Mm test1.TXT:644: Locate: ve | Try: be test1.TXT:776: Locate: hasn | Try: has test1.TXT:921: Locate: isn | Try: sin test1.TXT:945: Locate: ve | Try: be test1.TXT:953: Locate: ve | Try: be test1.TXT:989: Locate: Hmm | Try: Mm test1.TXT:1005: Locate: Hmm | Try: Mm test1.TXT:1085: Locate: wasn | Try: wans test1.TXT:1129: Locate: isn | Try: sin test1.TXT:1145: Locate: isn | Try: sin test1.TXT:1173: Locate: vomeronasal | Try: astronomer test1.TXT:1213: Locate: didn | Try: did test1.TXT:1289: Locate: ve | Try: be test1.TXT:1329: Locate: weren | Try: were test1.TXT:1349: Locate: wasn | Try: wans test1.TXT:1425: Locate: wouldn | Try: would test1.TXT:1425: Locate: weren | Try: were test1.TXT:1470: Locate: ve | Try: be test1.TXT:1495: Locate: ve | Try: be test1.TXT:1803: Locate: cefepime | Try: timepiece test1.TXT:1807: Locate: amikacin | Try: Kamikaze test1.TXT:1819: Locate: Mmm | Try: Mm test1.TXT:1839: Locate: kuai | Try: Kauai test1.TXT:1895: Locate: ve | Try: be test1.TXT:1903: Locate: isn | Try: sin test1.TXT:2012: Locate: ve | Try: be test1.TXT:2096: Locate: aren | Try: earn test1.TXT:2116: Locate: shouldn | Try: should test1.TXT:2168: Locate: whould | Try: would test1.TXT:2232: Locate: Hmm | Try: Mm test1.TXT:2800: Locate: Hmm | Try: Mm test1.TXT:2820: Locate: Hmm | Try: Mm test1.TXT:2930: Locate: ve | Try: be test1.TXT:2993: Locate: Hmm | Try: Mm test1.TXT:2997: Locate: Hmm | Try: Mm test1.TXT:3076: Locate: Uhh | Try: Shh test1.TXT:3331: Locate: Chh | Try: Ch test1.TXT:3376: Locate: Hmm | Try: Mm test1.TXT:3412: Locate: isn | Try: sin test1.TXT:3436: Locate: ve | Try: be test1.TXT:3448: Locate: exfoliator | Try: defoliator test1.TXT:3518: Locate: didn | Try: did test1.TXT:3531: Locate: didn | Try: did test1.TXT:3652: Locate: Hmm | Try: Mm test1.TXT:3696: Locate: ve | Try: be