语音信号实验1-基于时域分析技术的语音识别
参考链接:https://www.cnblogs.com/LXP-Never/p/10078200.html
- 读取语音信号,从.wav文件中读取语音信号的信息
参考:https://www.jianshu.com/p/947528f3dff8
重点理解,如何读取语音信号的通道数、采样率、比特数、采样点数等等。
我们找到了一个数据集,为了知道怎么处理该数据集,我首先分析了该数据集的格式
![](https://images.cnblogs.com/OutliningIndicators/ContractedBlock.gif)
# 读Wave文件并且绘制波形 import wave import matplotlib.pyplot as plt import numpy as np # 打开WAV音频 f = wave.open(r"recording_0_6.wav", "rb") # 音频头 参数 params = f.getparams() nchannels, sampwidth, framerate, nframes = params[:4] print("音频头参数:", params) print("通道数(Channels):", nchannels) print("采样率(SampleRate):", framerate) print("比特(Precision):", sampwidth*8) print("采样点数(frames):", nframes) print("帧数或者时间(Duration):", nframes/framerate) # 读取nframes个数据,返回字符串格式, 读取波形数据 str_data = f.readframes(nframes) f.close() # 将字符串转换为数组,得到一维的short类型的数组, 并归一化 wave_data = np.fromstring(str_data, dtype=np.short) wave_data = wave_data*1.0/(max(abs(wave_data))) # 绘制波形图 wave_data.shape = -1, 1 wave_data = wave_data.T time = np.arange(0, nframes) * (1.0 / framerate) plt.plot(time, wave_data[0]) plt.xlabel("time (seconds)") plt.show()
我找的数据集格式为
通道数(Channels): 1
采样率(SampleRate): 44100
比特(Precision): 16
采样点数(frames): 88200
帧数或者时间(Duration): 2.0
注:安装pyaudio参考 https://blog.csdn.net/happywlg123/article/details/107281936
https://blog.csdn.net/qq_39157144/article/details/94716974?utm_medium=distribute.pc_relevant.none-task-blog-title-2&spm=1001.2101.3001.4242
安装文件 https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio
2. 分帧和加窗
https://blog.csdn.net/YAOHAIPI/article/details/102826051?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522160447502819725222418661%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fall.%2522%257D&request_id=160447502819725222418661&utm_medium=distribute.pc_search_result.none-task-code-2~all~first_rank_v2~rank_v28-1-102826051-19.pc_first_rank_v2_rank_v28&utm_term=python%E5%88%86%E5%B8%A7%E5%8A%A0%E7%AA%97&spm=1018.2118.3001.4449
3. 端点检测
https://blog.csdn.net/rocketeerli/article/details/83307435