语音识别入门推荐文献【转】
Reading list from NCMMSC Speech group
1、Paper Referee Area and notes Link
George E. Dahl, Dong Yu, Li Deng, and Alex Acero, Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition, 2011, IEEE Trans on ASLP. Vol.20, No.1. 贾磊(百度) 推动DNN应用于工业级ASR http://research.microsoft.com/pubs/144412/dbn4lvcsr-transaslp.pdf
Lawrence R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition 谢磊(西工大) HMM http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf
End-to-End Text-Dependent Speaker Verification Georg Heigold, Ignacio Moreno, Samy Bengio, Noam Shazeer 肖雄(南洋理工大学) 这篇文章用神经网络来对不同长度的句子提取固定长度的向量(类似ivector)的作用。
Rapid Speaker Adaptation in Eigenvoice Space 苏腾荣(华米) 对后面的基于超矢量的方法都有影响
G. Hinton, L. Deng, D. Yu et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” Signal Processing Magazine, IEEE, vol. 29, no. 6, pp. 82-97, 2012. 邹月娴(北大深圳) DNN 声学模型
Speech recognition with weighted finite-state transducers 苏腾荣(华米) ASR的标配 http://www.cslu.ogi.edu/~zak/cs506-lvr/mohri-wfst_asr.pdf
Speech Recognition Algorithms Using Weighted Finite-State Transducers Takaaki Hori and Atsushi Nakamura Synthesis Lectures on Speech and Audio Processing, January 2013, Vol. 9, No. 1 , Pages 1-162 陶斐(UTD) ASR和WFST
Biing-Hwang Juang, Wu Chou, Member, and Chin-Hui Lee,Minimum classification error rate methods for speech recognition 洪青阳(厦门大学) 区分性训练MCE
Daniel Povey.Discriminative Training for Large Vocabulary Speech Recognition. 杨嵩(驰声科技) 声学模型区分性训练
Has�0�0im Sak, Andrew Senior, Kanishka Rao, Franc�0�0oise Beaufays, Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition 徐海华(南阳理工大学),苏牧(云知声) CTC
Alex Graves, Supervised Sequence Labeling with Recurrent Neural Networks. Phd thesis. 汤本来(南开),李博(谷歌) LSTM,CTC
Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition.Has�0�0im Sak, Andrew Senior, Kanishka Rao, Franc�0�0oise Beaufays 徐海华(南洋理工学) CTC http://arxiv.org/pdf/1507.06947.pdf
Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling by Brian Kingsbury, IBM Watson 王广森(新加坡I2R)
MJF Gales:Maximum likelihood linear transformations for HMM-based speech recognition.《Computer Speech & Language》, 1998, 12(2):75–98 钱彦旻(上海交大) MLLR
Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2),
171-185||钱彦旻(上海交大) ||MLLR||
Tandem connectionist feature extraction for conventional HMM systems,hermansky 钱彦旻(上海交大) 自适应
Subspace Gaussian mixture models for speech recognition. Povey, D. 钱彦旻(上海交大) dan的SGMM
A novel scheme for speaker recognition using a phonetically-aware deep neural network Y Lei, N Scheffer, L Ferrer, M McLaren 夏瑞(Intel Lab)
Campbell W M, Sturim D E, Reynolds D A. Support vector machines using GMM supervectors for speaker verification[J]. Signal Processing Letters, IEEE, 2006, 13(5): 308-311. 龙艳花(上海师范大学) 基于SVM声纹识别方面的文章
Campbell W M, Sturim D E, Reynolds D A, et al. SVM based speaker verification using a GMM supervector kernel and NAP variability compensation[C]//Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. IEEE, 2006, 1: I-I. 龙艳花(上海师范大学) 基于SVM声纹识别方面的文章
Douglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn, Speaker Verfication Using Adapted Gaussian Mixture Models 洪青阳(厦门大学) 说话人识别,GMM-UBM
Najim Dehak, Patrick Kenny, R′eda Dehak, Pierre Dumouchel, and Pierre Ouellet, Front-End Factor Analysis For Speaker Verification 洪青阳(厦门大学) 说话人识别,i-vector
Analysis of I-vector Length Normalization in Speaker Recognition Systems Daniel Garcia-Romero and Carol Y. Espy-Wilson 许敏强(阿里巴巴) length normalization + PLDA
Within-Class Covariance Normalization for SVM-based Speaker Recognition Andrew O. Hatch, Sachin Kajarekar, and Andreas Stolcke 许敏强(阿里巴巴) speaker方向,这个论文的方法,不仅可以用于speaker,还可以推广到图像识别、分类等领域,效果明显
Silke M Witt, Steve J Young, Phone-level pronunciation scoring and assessment for interactive language learning, 2000, Speech Communication 黄浩(新疆大学) GOP以及错误检测
S. M. Witt.Use of Speech Recognition in Computer-assisted Language learning 杨嵩(驰声科技) 语音评测
Andrew J. Hunt, Alan W. Black, Unit selection in a concatenative speech synthesis system using a large speech database, ICASSP1996. 康永国(百度) 拼接语音合成的典型工作
Zen H, Tokuda K, Black A W. Statistical parametric speech synthesis[J]. Speech Communication, 2009, 51(11): 1039-1064. 凌振华(中科大) HMM统计参数语音合成
Tokuda K, Nankaku Y, Toda T, et al. Speech synthesis based on hidden Markov models[J]. Proceedings of the IEEE, 2013, 101(5): 1234-1252. 凌振华(中科大) HMM统计参数语音合成
Zee, H., Senior, A., Schuster. M. 2013, Statistical parametric speech sythesis uusing deep neural networks 吴君如(华东师大),康永国(百度)
parameter generation algorithms for HMM-based speech synthesis, Proc. of ICASSP, pp.1315-1318, June 2000 康永国(百度) HMM统计参数语音合成
S. King, "A reading list of recent advances in speech synthesis", Proc. ICPhS2015. 武执正(爱丁堡大学),杨鹏(百度) https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS1043.pdf
statistical parametric speech synthesis,Heiga Zen 杨辰雨(新加坡I2R) 语音合成声学建模方面
ZH Ling:Deep Learning for Acoustic Modeling in Parametric Speech Generation.《Signal Processing Magazine IEEE》, 2015, 32(3):35-52 杨辰雨(新加坡I2R) 语音合成声学建模方面
Xu Yi. Separation of functional components of tone and intonation from observed F0 patterns. 林怡亭(Nuance),李雅(中科院自动化所)
automatic segmentation of speech into sentences and topics. Speech communication, 32(1), 127-154. 陈磊(ETS语音评测),谢磊(西工大) SRI使用Prosody信息做语音结构化切分的工作,Google Scholar 引用 430
ToBI: A standard for labeling English prosody 杨辰雨(新加坡I2R) 中英文韵律标注
chinese prosody and prosodic labeling of spontaneous speech 杨辰雨(新加坡I2R) C-ToBI 3.0
Shrikanth S. Narayanan and Panayiotis Georgiou, Behavioral Signal Processing: Deriving Human Behavioral Informatics from Speech and Language (2013), in: Proceedings of IEEE, 101:5(1203 - 1233) 李明(中山大学) 语音及多模态行为信号分析的综述性paper 推荐给做情感计算和行为分析这一领域的人
Levelt. W, Roelofs. A, 1999, A theory of lexical access in speech production. 吴君如(华东师大) 语言认知领域,本文为心理语言学界到90年代末为止,对人类语言产生心理过程实证研究结果及机制探讨最全面的总结,不少计算模型都以重现本文列举的效应为目标
A Highly Robust Audio Fingerprinting System,Pilips 的Jaap Haitsma 朱磊(芋头科技) audio fingerprint
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013. 陈谐(剑桥)
Dzmitry Bahdanau, KyungHyun Cho, Yoshua Bengio, Neural Machine Translation By Jointly Learning To Align And Translate 肖雄(南洋理工大学),徐海华(南洋理工大学) attention model for MT http://arxiv.org/pdf/1409.0473.pdf
2、Book and Thesis
《Spoken Language Processing: A Guide to Theory, Algorithm, and System Development》 黄学东 何伟(中国传媒大学)钱彦旻(上海交大)
自然语言处理综论,daniel jurafsky 汪淼淼(阿里巴巴)
Speech enhancement theory and practice, Philipos C. Loizou, 张学良(内蒙古大学) 语音增强的书
Statistical methods for speech recognition, Jenilek, 金琴(中国人民大学)经典教材
Hidden Markov Models for Speech Recognition (Edinburgh University Press 1990) 穆向禹(百度)
Machine Learning Paradigms for Speech Recognition 卢鲤(腾讯) 用机器学习的观点看语音识别,框架非常清晰
《实用语音识别基础》,国防工业出版社 王晶(北理工)
Text-to-speech synthesis, Paul Taylor, University of Cambridge 黄东延(新加坡) 书对text-to-speech 怎样work 给了详细深入的解释
A course in phonetics, Ladefoged 冯卉(天津大学) 群内多人推荐
A Course in Phonetics (7th Ed.). P. Ladeforged & K. Johnson (2015). Cengage Learning. 顾文涛(南京师范大学) 很好的入门级教科书
Acoustics and Auditory Phonetics (3rd Ed.).K. Johnson (2012). Wiley-Blackwell. 顾文涛(南京师范大学)
Articulatory Phonetics. B. Gick, I. Wilson, & D. Derrick (2013). Wiley-Blackwell. 顾文涛(南京师范大学)
实验语音学概要,实验语音学概要 修订版 熊子瑜(语言所),时秀娟(天津师大)
实验语音学基础教程,孔江平 时秀娟(天津师大)
Phonetics,Reetz & Jongman 孙锐欣(华东师大)国内李爱军老师等在翻译中文版
《实验语音学概要》吴宗济 王磊(音乐雷达)等 语音合成--音韵学
自然语言处理综论,Daniel Jurafsky
Duda的 Pattern Classification 第二版,有中文版 谢凌云(中国传媒大学) 模式识别
《现代汉语音典》蔡莲红、孔江平 王愈(捷通华声)
《汉语语调实验研究》2012年,作者林茂灿 李爱军(社科院语言所)
在英语语调理论AM基础上对汉语语调的研究
Sun-Ah Jun写的prosodic topology,中科院声学所吕士楠老师将之翻译为中文版《韵律类型学》 郝玉峰(海天瑞声) 多语言韵律标注
Kenneth N. Stevens的Acoustic Phonetics 解炎陆(北京语言大学) 从acoustic的角度阐述了各种发音的特征,原版太贵,希望国内能出版。
"Ladefoged《世界语音》 时秀娟(天津师大) 【荐书】Peter Ladefoged《世界语音》
Theory and Applications of Digital Speech Processing, Lawrence Rabiner, 党建武(天津大学)
T. F. Quatieri, Discrete-time speech signal processing(英文版) 王晶(北理工) 经典的语音信号处理课程教材
《信号与系统》奥本海《Signals and Systems》Alan V. Oppenheim 陈谐(剑桥)
Microphone Arrays: Signal Processing Techniques and Applications (Digital Signal Processing) by Michael Brandstein, Darren Ward, Springer, 2001. 李军锋(中科院声学所) 语音信号处理领域
Pattern recognition and meachine learning 王东(清华) 机器学习领域经典大作
Machine learning a probabilistic perspective,machine learning algorithmic perspective 卢鲤(腾讯)
Introduction to statistical pattern recognition. Keinosuke Fukunaga 朱璇(三星北京研究院) 模式识别 这本书对于特征空间的表述非常清晰,深入浅出,很适合初学者。
An introduction for support vector machine 朱璇(三星北京研究院) svm
步尚全《基础泛函分析》 邓侃(思昂教育) 泛函
<<测度论与概率论基础>>,北京大学出版社 明怀平(新加坡I2R)
Daniel Povey, "Discriminative Training for Large Vocabulary Speech Recognition," PhD thesis, Cambridge University Engineering Dept, 2003 俞凯(上海交大) 鉴别性训练,博士论文
语境相关的声学模型和搜索策略的研究,高升,中国科学院博士论文,2001 李宏言(阿里巴巴) 国内早期lvcsr的力作
3、Tools:
HTK book
Kaldi
Praat
Theano
CNTK
RNNLIB
Eesen CTC toolkit yajiemiao/eesen - Video & online course
4、其他:
Deep Learning Summer School, Montreal 2015 Deep Learning Summer School, Montreal 2015
INTRODUCTION TO DIGITAL FILTERS 王愈(捷通华声) 一套在线的信号处理教程,深入浅出地讲解了信号分析处理的基础知识,并结合Matlab常用的信号系统库函数——如freqz——推导讲解简明透彻 INTRODUCTION TO DIGITAL FILTERS WITH AUDIO APPLICATIONS
九州语言网 李爱军(社科院语言所)