音频转文字
案例1
-
效果,打开音频文件,运行,识别文字
-
功能强大,但不推荐使用,非常消耗性能和资源
-
具体步骤
点击查看详情
# 拉取到本地 git clone https://github.com/YaoFANGUK/video-subtitle-generator.git # 打开终端,进入项目根目录 C:\Users\ychen\Downloads\video-subtitle-generator (main -> origin) λ pwd C:\Users\ychen\Downloads\video-subtitle-generator # 创建虚拟环境 C:\Users\ychen\Downloads\video-subtitle-generator (main -> origin) λ conda create -n vsgEnv python=3.8 Fetching package metadata ................. Solving package specifications: . Package plan for installation in environment C:\ProgramData\Anaconda3\envs\vsgEnv: The following NEW packages will be INSTALLED: ca-certificates: 2023.12.12-haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libffi: 3.4.4-hd77b12b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main openssl: 1.1.1w-h2bbff1b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main pip: 23.3.1-py38haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main python: 3.8.18-h6244533_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main setuptools: 68.2.2-py38haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main sqlite: 3.41.2-h2bbff1b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main vc: 14.2-h21ff451_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main vs2015_runtime: 14.27.29016-h5e58377_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main wheel: 0.41.2-py38haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main Proceed ([y]/n)? y ca-certificate 100% |#######################################################################| Time: 0:00:00 848.38 kB/s libffi-3.4.4-h 100% |#######################################################################| Time: 0:00:00 4.78 MB/s openssl-1.1.1w 100% |#######################################################################| Time: 0:00:00 9.53 MB/s python-3.8.18- 100% |#######################################################################| Time: 0:00:01 11.69 MB/s setuptools-68. 100% |#######################################################################| Time: 0:00:00 11.87 MB/s wheel-0.41.2-p 100% |#######################################################################| Time: 0:00:00 14.51 MB/s pip-23.3.1-py3 100% |#######################################################################| Time: 0:00:00 11.78 MB/s # # To activate this environment, use: # > activate vsgEnv # # To deactivate an active environment, use: # > deactivate # # * for power-users using bash, you must source # # 激活虚拟环境 C:\Users\ychen\Downloads\video-subtitle-generator (main -> origin) λ activate vsgEnv # 安装依赖 C:\Users\ychen\Downloads\video-subtitle-generator (main -> origin) (vsgEnv) λ pip install -r requirements.txt Looking in indexes: https://mirrors.aliyun.com/pypi/simple/ Collecting appdirs==1.4.4 (from -r requirements.txt (line 1)) Downloading https://mirrors.aliyun.com/pypi/packages/3b/00/2344469e2084fb287c2e0b57b72910309874c3245463acd6cf5e3db69324/appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB) Collecting audioread==2.1.9 (from -r requirements.txt (line 2)) Downloading https://mirrors.aliyun.com/pypi/packages/b3/d1/e324634c5867a668774d6fe233a83228da4ba16521e19059c15df899737d/audioread-2.1.9.tar.gz (377 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 377.5/377.5 kB 1.6 MB/s eta 0:00:00 Preparing metadata (setup.py) ... done # 识别文字 C:\Users\ychen\Downloads\video-subtitle-generator (main -> origin) (vsgEnv) λ python gui.py 选择识别的语言: 中文(简体) 选择识别模式: 标准 1536 864 Process SpawnPoolWorker-2: Process SpawnPoolWorker-4: Process SpawnPoolWorker-1: Process SpawnPoolWorker-3: Process SpawnPoolWorker-5:
案例2
- 安装依赖
pip install SpeechRecognition
-
下载swig,解压后配置环境变量
-
打开cmd,验证
C:\Users\ychen>swig -version SWIG Version 4.0.2 Compiled with i686-w64-mingw32-g++ [i686-w64-mingw32] Configured options: +pcre Please see http://www.swig.org for reporting bugs and further information
-
离线安装pocketsphinx
-
下载本地python对应版本
-
将下载的whl文件移动到venv\Scripts目录下,执行入下命令
C:\ProgramData\Anaconda3\Scripts>pip install pocketsphinx-0.1.15-cp36-cp36m-win_amd64.whl Looking in indexes: https://mirrors.aliyun.com/pypi/simple/ Processing c:\programdata\anaconda3\scripts\pocketsphinx-0.1.15-cp36-cp36m-win_amd64.whl Installing collected packages: pocketsphinx Successfully installed pocketsphinx-0.1.15
普通话是新版本,存档是老版本,下载其中一个 把解压出来的zh_cn.cd_cont_5000文件夹重命名为acoustic-model、zh_cn.lm.bin命名为language-model.lm.bin、zh_cn.dic中dic改为dict格式,zh_cn替换为pronounciation-dictionary 最后把这个三个文件放在zh-CN文件夹中
-
测试,音频识别准确率非常低
-
代码
点击查看代码
import speech_recognition as sr def wav2txt(): r = sr.Recognizer() # 打开语音文件 with sr.AudioFile('./test.wav') as source: audio = r.record(source) # print('文本内容: ', r.recognize_sphinx(audio, language='zh-CN')) # 汉语 try: print('文本内容: ', r.recognize_sphinx(audio, language='zh-CN')) # 汉语 # print('文本内容: ', r.recognize_sphinx(audio)) # 英语 except Exception as e: print('报错了,报错内容---',e) wav2txt()
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?
2022-03-02 mybatis plus 总结