展开
拓展 关闭
订阅号推广码
GitHub
视频
公告栏 关闭

音频转文字

案例1

  • github地址

  • 效果,打开音频文件,运行,识别文字

  • 功能强大,但不推荐使用,非常消耗性能和资源

  • 具体步骤

点击查看详情
# 拉取到本地
git clone https://github.com/YaoFANGUK/video-subtitle-generator.git
# 打开终端,进入项目根目录
C:\Users\ychen\Downloads\video-subtitle-generator (main -> origin)
λ pwd
C:\Users\ychen\Downloads\video-subtitle-generator
# 创建虚拟环境
C:\Users\ychen\Downloads\video-subtitle-generator (main -> origin)
λ conda create -n vsgEnv python=3.8
Fetching package metadata .................
Solving package specifications: .
Package plan for installation in environment C:\ProgramData\Anaconda3\envs\vsgEnv:
The following NEW packages will be INSTALLED:
ca-certificates: 2023.12.12-haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libffi: 3.4.4-hd77b12b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
openssl: 1.1.1w-h2bbff1b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pip: 23.3.1-py38haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python: 3.8.18-h6244533_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
setuptools: 68.2.2-py38haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
sqlite: 3.41.2-h2bbff1b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
vc: 14.2-h21ff451_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
vs2015_runtime: 14.27.29016-h5e58377_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
wheel: 0.41.2-py38haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
Proceed ([y]/n)? y
ca-certificate 100% |#######################################################################| Time: 0:00:00 848.38 kB/s
libffi-3.4.4-h 100% |#######################################################################| Time: 0:00:00 4.78 MB/s
openssl-1.1.1w 100% |#######################################################################| Time: 0:00:00 9.53 MB/s
python-3.8.18- 100% |#######################################################################| Time: 0:00:01 11.69 MB/s
setuptools-68. 100% |#######################################################################| Time: 0:00:00 11.87 MB/s
wheel-0.41.2-p 100% |#######################################################################| Time: 0:00:00 14.51 MB/s
pip-23.3.1-py3 100% |#######################################################################| Time: 0:00:00 11.78 MB/s
#
# To activate this environment, use:
# > activate vsgEnv
#
# To deactivate an active environment, use:
# > deactivate
#
# * for power-users using bash, you must source
#
# 激活虚拟环境
C:\Users\ychen\Downloads\video-subtitle-generator (main -> origin)
λ activate vsgEnv
# 安装依赖
C:\Users\ychen\Downloads\video-subtitle-generator (main -> origin)
(vsgEnv) λ pip install -r requirements.txt
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting appdirs==1.4.4 (from -r requirements.txt (line 1))
Downloading https://mirrors.aliyun.com/pypi/packages/3b/00/2344469e2084fb287c2e0b57b72910309874c3245463acd6cf5e3db69324/appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Collecting audioread==2.1.9 (from -r requirements.txt (line 2))
Downloading https://mirrors.aliyun.com/pypi/packages/b3/d1/e324634c5867a668774d6fe233a83228da4ba16521e19059c15df899737d/audioread-2.1.9.tar.gz (377 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 377.5/377.5 kB 1.6 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
# 识别文字
C:\Users\ychen\Downloads\video-subtitle-generator (main -> origin)
(vsgEnv) λ python gui.py
选择识别的语言: 中文(简体)
选择识别模式: 标准
1536 864
Process SpawnPoolWorker-2:
Process SpawnPoolWorker-4:
Process SpawnPoolWorker-1:
Process SpawnPoolWorker-3:
Process SpawnPoolWorker-5:

案例2

  • 安装依赖
pip install SpeechRecognition
  • 下载swig,解压后配置环境变量

  • 打开cmd,验证

C:\Users\ychen>swig -version
SWIG Version 4.0.2
Compiled with i686-w64-mingw32-g++ [i686-w64-mingw32]
Configured options: +pcre
Please see http://www.swig.org for reporting bugs and further information
  • 离线安装pocketsphinx

  • 下载本地python对应版本

  • 将下载的whl文件移动到venv\Scripts目录下,执行入下命令

C:\ProgramData\Anaconda3\Scripts>pip install pocketsphinx-0.1.15-cp36-cp36m-win_amd64.whl
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Processing c:\programdata\anaconda3\scripts\pocketsphinx-0.1.15-cp36-cp36m-win_amd64.whl
Installing collected packages: pocketsphinx
Successfully installed pocketsphinx-0.1.15
普通话是新版本,存档是老版本,下载其中一个
把解压出来的zh_cn.cd_cont_5000文件夹重命名为acoustic-model、zh_cn.lm.bin命名为language-model.lm.bin、zh_cn.dic中dic改为dict格式,zh_cn替换为pronounciation-dictionary
最后把这个三个文件放在zh-CN文件夹中

  • 测试,音频识别准确率非常低

  • 代码

点击查看代码
import speech_recognition as sr
def wav2txt():
r = sr.Recognizer()
# 打开语音文件
with sr.AudioFile('./test.wav') as source:
audio = r.record(source)
# print('文本内容: ', r.recognize_sphinx(audio, language='zh-CN')) # 汉语
try:
print('文本内容: ', r.recognize_sphinx(audio, language='zh-CN')) # 汉语
# print('文本内容: ', r.recognize_sphinx(audio)) # 英语
except Exception as e:
print('报错了,报错内容---',e)
wav2txt()
posted @   DogLeftover  阅读(79)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?
历史上的今天:
2022-03-02 mybatis plus 总结
点击右上角即可分享
微信分享提示

目录导航