Segmentation Reading List
Segmentation Reading List
Word Segmentation and Word discovery
Reference & Comment | |
1 |
Ogawa, Yasushi; Matsuda, Toru |
2 |
Jens Kohlmorgen, Steven Lemm. 2001. A Dynamic HMM for On-line Segmentation of Sequential Data. To appear in Proceedings of NIPS-2001. 和wordseg不太相关 |
3 |
Unsupervised Learning of Word Segmentation Rules with Genetic Algorithms and Inductive Logic Programming. 2001. Dimitar Kazakov, Suresh Manandhar. Machine Learning, 43 (1/2):121-162, April 2001. (C) Kluwer Academic Publishers 不错,可是是用来做Morph的(recommended) |
4 |
A Statistical Model for Word Discovery in Transcribed Speech 2001. Anand Venkataraman Computational Linguistics Volume 27 Number 3 Pages 351 - 379, 2001. |
5 |
Sun Maosong, Shen Dayang, and Huang Changning, 1997. Cseg & tag1.0: A Practical Word Segmenter and POS Tagger for Chinese Texts, Fifth Conference on Applied Natural Language Processing, Washington, DC. USA, pp.119-126, 1997.3.31-4.3 Supervised 的survey |
6 |
Tom B.Y.Lai, Sun Maosong,, Benjamin K. Tsou, S. Caesar Lun, 1997. Chinese Word Segmentation and Part-of-Speech Tagging in One Step, Proceedings of Rocling X International Conference 1997 Research on Computational Linguistics, Taipei, Taiwan, China, August 22-24, pp.229-236, 1997. 分而治之策略 |
7 |
W. J. Teahan. Text Classification and Segmentation Using Minimum Cross-Entropy. In Proceedings of the International Conference on Content-based Multimedia Information Access (RIAO 2000), pages 943-961. C.I.D.-C.A.S.I.S, Paris,France, 2000. ISBN 2-905450-07-X. 和下一篇一样 |
8 |
W. J. Teahan, Y. Wen*, R. McNab*, and I. H. Witten*. A Compression-based Algorithm for Chinese Word Segmentation. Computational Linguistics, 26(3):375-393, 2000. ISSN 0891-2017. Supervised Word Segmentation,最短路算法框架 |
9 |
A. Stolcke & E. Shriberg |
10 |
A. Stolcke, E. Shriberg, R. Bates, M. Ostendorf, D. Hakkani, M. Plauche, G. Tur, & Y. Lu (1998). Automatic Detection of Sentence Boundaries and Disfluencies based on Recognized Words. Proc. Intl. Conf. on Spoken Language Processing, vol. 5, pp. 2247-2250, Sydney, Australia |
11 |
Deb Roy |
12 |
Michael R. Brent and Xiaopeng Tao 没怎么看懂,感觉不太好 |
13 |
Ando, R. K. and Lee, L. 2000. Mostly-Unsupervised Statistical Segmentation of Japanese: Application to Kanji. ANLP-NAACL. Mutual Information 体系,可以借鉴(recommended) |
14 |
Baker, D., Hofmann, T., McCallum, A. and Yang, Y. A Hierarchical Probabilistic Model for Novelty Detection in Text. Unpublished manuscript. 和分词没什么关系 |
15 |
Brand, M. 1999. Structure learning in conditional probability models via an entropic prior and parameter extinction. In Neural Computation, vol.11, page 1155-1182 下面一篇的Journal版 |
16 |
M. Brand, 1998. An entropic estimator for structure discovery. To appear, NIPS98 虽然和wordseg不太相关,但是……太赞了,无语的赞(strongly recommended!) |
17 |
M. Brand, 1999, Pattern discovery via entropy minimization. To appear, Uncertainty99 (AI & Statistics) 和上一篇一样 |
18 |
Brent1999 Brent, M. 1999. An efficient, probabilistically sound algorithm for segmentation and word discovery. Machine Learning, 34, 71-106. |
19 |
Brent, M.R. & T. A. Cartwright. 1996. Distributional regularity and phonotactic constraints are ueful for segmentation. In Computational Approaches to Language Acquisition, ed. Michael Brent. Cambridge, MA, MIT Press. |
20 |
Brent, M. R. 1999. Speech segmentation and word discovery: A computational perspective. Trends in Cognitive Science, 3, 294-301. |
21 |
Dahan and Brent, M. 1999. On the discovery of novel word-like units from utterances: An artificial-language study with implications for native-language acquisition. In Journal of Experimental Psychology:General Vol. 128,pp. 165-185 |
22 |
Brown1991 Brown, E. K. , Miller, J. 1991. Syntax:A Linguistic Introduction to Sentence Structure. Publisher: HarperCollins ,London |
23 |
Jing-Shin Chang and Keh-Yih Su, 1997, An Unsupervised Iterative Method for Chinese New Lexicon Extraction, InInternational Journal of Computational Linguistics & Chinese Language Processing. 太差了,废话又多,就是EM,何必弄那么复杂呢? |
24 |
Chang, Jing-Shin, Yi-Chung Lin and Keh-Yih Su. 1995. Automatic Construction of a Chinese Electronic Dictionary. Proceedings of the Third Workshop on Very Large Corpora, pp. 107-120, MIT, June, 1995. 就是上一篇 |
25 |
Brian Clarkson and Alex Pentland. 1999. Unsupervised clustering of ambulatory audio and video. In In International Conference on Acoustics, Speech and Signal Processing, volume VI, pages 3037-3040. IEEE, 1999. |
26 |
Deligne, S. and Bimbot, F. 1995. Language Modeling by Variable Length Sequences:Theoretical Formulation and Evaluation of Multigrams. ICASSP,1995 |
27 |
S. Deligne, F. Yvon, and F. Bimbot. 1995. Variable-length sequence matching for phonetic transcription using joint multigrams. In EUROSPEECH. |
28 |
Deligne, S.; Yvon, F.; and Bimbot, F. 1996. Introducing statistical dependencies and structural constraints in variable-length sequence models. In Miclet, L., and de la Higuera, C., eds., Grammatical Inference: Learning Syntax from Sentences, Lecture Notes in Artificial Intelligence 1147. Springer. 156-167. |
29 |
de Marken, C. 1995. The Unsupervised Acquisition of a Lexicon from Continuous Speech. Technical Report A.I. Memo No. 1558, AI Lab., MIT. Cambridge, Massachusetts. |
30 |
Ge, X., Pratt, W. and Smyth, P. 1999. Discovering Chinese Words from Unsegmented Text. SIGIR-99,pages 271-272. EM体系。paper中报道的实验结果很好,还需实际验证(recommended) |
31 |
Goldsmith, J. 2001. Unsupervised Learning of the Morphology of a Natural Language. to appear in Computational Linguistics 2001. |
32 |
A. Hanjalic, R.L. Lagendijk, J. Biemond. 1999. Automatically Segmenting Movies into Logical Story Units. In D.P. Huijsmans, A.W.M. Smeulders (eds.): Lecture Notes in Computer Science 1614: Visual Information and Information Systems, ISBN 3-540-66079-8, pages 229-236, Springer Verlag 1999 (Proceedings of the Third International Conference VISUAL '99, Amsterdam (NL), June 1999) |
33 |
Hua, Y. 2000. Unsupervised word induction using MDL criterion. ISCSL2000, Beijing.还不错,EM体系和MDL的结合。(recommended) |
34 |
Kit, C. and Wilks, Y. 1999. Unsupervised Learning of Word Boundary with Description Length Gain. In Proceedings CoNLL99 ACL Workshop. Bergen. 有新意,但有缺陷。可以用来初始化EM(recommended) |
35 |
Kit, C. 2000. Unsupervised Lexical Learning as Inductive Inference PhD thesis, University of Sheffield, UK, 2000. |
36 |
Ponte, J. M. and Croft, W. B. 1996. Useg: A retargetable word segmentation procedure for information retrievals. In Symposium on Document Analysis and Information Retrival 96 (SDAIR). |
37 |
Peng,Fuchun and Schuurmans, Dale 2001. Self-supervised Chinese Word Segmentation. The 4th Internation Symposium on Intelligent Data Analysis(IDA2001), September, 2001, Lisbon, Portugal. |
38 |
Peng,Fuchun and Schuurmans, Dale 2001. A Hierarchical EM Approach to Word Segmentation, To appear in Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS 2001), Nov. 2001, Tokyo, Japan. EM体系,但是想法比较繁琐。 |
39 |
Sproate, R. and Shih, C. 1990. A statistical method for finding word boundaries in Chinese text. Computer Processing of Chinese and Oriental Languages, 4:336-351. |
40 |
Zhao, J., Gao, J., Chang, E. and Li, M. 2000. Lexicon optimization for Chinese language modeling. International Symposium Conference on Spoken Language Processing, Beijing. |
41 |
Su, K., Wu, M., & Chang, J. 1994. A Corpus-Based Approach to Automatic Compound Extraction. ACL Proceedings: 32nd Annual Meeting of the Association for Computational Linguistics, (Las Cruces, NM, June 1994), ACL, Morristown, NJ, pp.242-247. |
42 |
Wu, M.-W. and K.-Y. Su, 1993. Corpus-based Automatic Compound Extraction with Mutual Information and Relative Frequency Count. Proceedings of ROCLING VI, pp. 207-216, Nantou, Taiwan, ROC, Sep. 1993. |
43 |
Chen, K., & Chen, H. 1994. Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation. ACL Proceedings: 32nd Annual Meeting of the Association for Computational Linguistics, (Las Cruces, NM, June 1994),ACL, Morristown, NJ, pp. 234-241. |
44 |
Jin, Wanying. 1992. Chinese Segmentation and its Disambiguation. MCCS-92-227, Computing Research Laboratory, New Mexico State University, Las Cruces, New Mexico. |
45 |
Kok-Wee Gan, Martha Palmer, Kim-Teng Lua 1996. A Statistically Emergent Approach for Language Processing: Application to Modeling Context Effects in Ambiguous Chinese Word Boundary Perception. Computational Linguistics, Volume 22,531-553,1996. |
46 |
Sun Maosong, Shen Dayang, Benjamin K. Tsou 1998. Chinese Work Segmentation without Using Lexicon and Hand-crafted Training Data. COLING-ACL 1998: 1265-1271 |