Hunspell介绍及试用

1、简介

  Hunspell是一个为拥有多态和复杂组合词的语言所设计的拼写检查器,原本为匈牙利语设计。

  Hunspell是一个自由软件,在GPL、LGPL和MPL三许可证下发行。

  Hunspell对主要平台和编程语言都有接口和封装。Hunspell基于MySpell,并且与MySpell词典后端兼容。MySpell使用单字节字符编码,而Hunspell则可以使用Unicode UTF-8编码的词典。

2、以下应用程序使用Hunspell作为拼写检查器:

  Mac OS X10.6 以及之后版本

  Eclipse,使用Hunspell4Eclipse

  Google Chrome,Google开发的一个网页浏览器

  Evernote,笔记软件

  LibreOffice和OpenOffice.org,开源办公组件

  Mozilla Firefox和Thunderbird以及SeaMonkey

  Opera,一个跨平台的网页浏览器

  Scribus,桌面出版应用

  Vim,一个文本编辑器

  WPS Office,国产办公组件

3、使用docker镜像测试Hunspell的功能:

  3.1查看可用字典

[root@host-10-0-251-159 hunspell]# docker run --rm tmaier/hunspell -D
SEARCH PATH:
.::/usr/share/hunspell:/usr/share/myspell:/usr/share/myspell/dicts:/Library/Spelling:/root/.openoffice.org/3/user/wordbook:/root/.openoffice.org2/user/wordbook:/root/.openoffice.org2.0/user/w/lib/openoffice.org/basis3.0/share/dict/ooo:/opt/openoffice.org2.4/share/dict/ooo:/usr/lib/openoffice.org2.4/share/dict/ooo:/opt/openoffice.org2.3/share/dict/ooo:/usr/lib/openoffice.org2.3/shhare/dict/ooo:/opt/openoffice.org2.1/share/dict/ooo:/usr/lib/openoffice.org2.1/share/dict/ooo:/opt/openoffice.org2.0/share/dict/ooo:/usr/lib/openoffice.org2.0/share/dict/ooo
AVAILABLE DICTIONARIES (path is not mandatory for -d option):
/usr/share/hunspell/en_CA
/usr/share/hunspell/de_DE_comb
/usr/share/hunspell/en_ZA
/usr/share/hunspell/en_US
/usr/share/hunspell/en_GB
/usr/share/hunspell/en_AU
/usr/share/hunspell/de_CH
/usr/share/hunspell/de_DE_neu
/usr/share/hunspell/en_NZ
/usr/share/hunspell/de_AT
/usr/share/hunspell/default
LOADED DICTIONARY:
/usr/share/hunspell/default.aff
/usr/share/hunspell/default.dic
Hunspell 1.6.2

  3.2查看帮助信息

[root@host-10-0-251-159 hunspell]# docker run --rm -v $(pwd):/workdir tmaier/hunspell -u3 -i utf-8 -d de_DE_neu,en_US,de_CH -p words  -h
Usage: hunspell [OPTION]... [FILE]...
Check spelling of each FILE. Without FILE, check standard input.
 
  -1        check only first field in lines (delimiter = tabulator)
  -a        Ispell's pipe interface
  --check-url   check URLs, e-mail addresses and directory paths
  --check-apostrophe    check Unicode typographic apostrophe
  -d d[,d2,...] use d (d2 etc.) dictionaries
  -D        show available dictionaries
  -G        print only correct words or lines
  -h, --help    display this help and exit
  -H        HTML input file format
  -i enc    input encoding
  -l        print misspelled words(只打印错误的单词)
  -L        print lines with misspelled words(打印错误单词所在行)
  -m        analyze the words of the input text
  -n        nroff/troff input file format
  -O        OpenDocument (ODF or Flat ODF) input file format
  -p dict   set dict custom dictionary
  -r        warn of the potential mistakes (rare words)
  -P password   set password for encrypted dictionaries
  -s        stem the words of the input text
  -S        suffix words of the input text
  -t        TeX/LaTeX input file format
  -v, --version print version number
  -vv       print Ispell compatible version number
  -w        print misspelled words (= lines) from one word/line input.
  -X        XML input file format
 
Example: hunspell -d en_US file.txt    # interactive spelling
         hunspell -i utf-8 file.txt    # check UTF-8 encoded file
         hunspell -l *.odt             # print misspelled words of ODF files
 
         # Quick fix of ODF documents by personal dictionary creation
 
         # 1 Make a reduced list from misspelled and unknown words:
 
         hunspell -l *.odt | sort | uniq >words
 
         # 2 Delete misspelled words of the file by a text editor.
         # 3 Use this personal dictionary to fix the deleted words:
 
         hunspell -p words *.odt
 
Bug reports: http://hunspell.github.io/

  3.3检查某个文档的拼写(显示错误词所在行数及建议更改)原文:test1.TXT(链接:https://pan.baidu.com/s/17JRmtnebLblVsMG05CIm-w 密码:l3q9)

[root@host-10-0-251-159 hunspell]# docker run --rm -v $(pwd):/workdir tmaier/hunspell -u3 -i utf-8 -d de_DE_neu,en_US,de_CH -p words  test1.TXT
test1.TXT:7: Locate: rans | Try: rand
test1.TXT:15: Locate: wew | Try: woo
test1.TXT:23: Locate: Sevenn | Try: Severn
test1.TXT:27: Locate: cannt | Try: canny
test1.TXT:203: Locate: Hmm | Try: Mm
test1.TXT:211: Locate: Lele | Try: Lee
test1.TXT:215: Locate: Lele | Try: Lee
test1.TXT:243: Locate: Lele | Try: Lee
test1.TXT:247: Locate: Lele | Try: Lee
test1.TXT:284: Locate: Hmm | Try: Mm
test1.TXT:292: Locate: Hmm | Try: Mm
test1.TXT:468: Locate: ve | Try: be
test1.TXT:500: Locate: ve | Try: be
test1.TXT:516: Locate: ve | Try: be
test1.TXT:564: Locate: Hmm | Try: Mm
test1.TXT:644: Locate: ve | Try: be
test1.TXT:776: Locate: hasn | Try: has
test1.TXT:921: Locate: isn | Try: sin
test1.TXT:945: Locate: ve | Try: be
test1.TXT:953: Locate: ve | Try: be
test1.TXT:989: Locate: Hmm | Try: Mm
test1.TXT:1005: Locate: Hmm | Try: Mm
test1.TXT:1085: Locate: wasn | Try: wans
test1.TXT:1129: Locate: isn | Try: sin
test1.TXT:1145: Locate: isn | Try: sin
test1.TXT:1173: Locate: vomeronasal | Try: astronomer
test1.TXT:1213: Locate: didn | Try: did
test1.TXT:1289: Locate: ve | Try: be
test1.TXT:1329: Locate: weren | Try: were
test1.TXT:1349: Locate: wasn | Try: wans
test1.TXT:1425: Locate: wouldn | Try: would
test1.TXT:1425: Locate: weren | Try: were
test1.TXT:1470: Locate: ve | Try: be
test1.TXT:1495: Locate: ve | Try: be
test1.TXT:1803: Locate: cefepime | Try: timepiece
test1.TXT:1807: Locate: amikacin | Try: Kamikaze
test1.TXT:1819: Locate: Mmm | Try: Mm
test1.TXT:1839: Locate: kuai | Try: Kauai
test1.TXT:1895: Locate: ve | Try: be
test1.TXT:1903: Locate: isn | Try: sin
test1.TXT:2012: Locate: ve | Try: be
test1.TXT:2096: Locate: aren | Try: earn
test1.TXT:2116: Locate: shouldn | Try: should
test1.TXT:2168: Locate: whould | Try: would
test1.TXT:2232: Locate: Hmm | Try: Mm
test1.TXT:2800: Locate: Hmm | Try: Mm
test1.TXT:2820: Locate: Hmm | Try: Mm
test1.TXT:2930: Locate: ve | Try: be
test1.TXT:2993: Locate: Hmm | Try: Mm
test1.TXT:2997: Locate: Hmm | Try: Mm
test1.TXT:3076: Locate: Uhh | Try: Shh
test1.TXT:3331: Locate: Chh | Try: Ch
test1.TXT:3376: Locate: Hmm | Try: Mm
test1.TXT:3412: Locate: isn | Try: sin
test1.TXT:3436: Locate: ve | Try: be
test1.TXT:3448: Locate: exfoliator | Try: defoliator
test1.TXT:3518: Locate: didn | Try: did
test1.TXT:3531: Locate: didn | Try: did
test1.TXT:3652: Locate: Hmm | Try: Mm
test1.TXT:3696: Locate: ve | Try: be

 

posted @ 2018-07-10 13:06  振宇要低调  阅读(5116)  评论(0编辑  收藏  举报