音频转文字

案例1

github地址
效果，打开音频文件，运行，识别文字
功能强大，但不推荐使用，非常消耗性能和资源
具体步骤

点击查看详情

# 拉取到本地
git clone https://github.com/YaoFANGUK/video-subtitle-generator.git

# 打开终端，进入项目根目录
C:\Users\ychen\Downloads\video-subtitle-generator (main -> origin)
λ pwd
C:\Users\ychen\Downloads\video-subtitle-generator

# 创建虚拟环境
C:\Users\ychen\Downloads\video-subtitle-generator (main -> origin)
λ conda create -n vsgEnv python=3.8
Fetching package metadata .................
Solving package specifications: .

Package plan for installation in environment C:\ProgramData\Anaconda3\envs\vsgEnv:

The following NEW packages will be INSTALLED:

    ca-certificates: 2023.12.12-haa95532_0  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libffi:          3.4.4-hd77b12b_0       https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    openssl:         1.1.1w-h2bbff1b_0      https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    pip:             23.3.1-py38haa95532_0  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    python:          3.8.18-h6244533_0      https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    setuptools:      68.2.2-py38haa95532_0  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    sqlite:          3.41.2-h2bbff1b_0      https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    vc:              14.2-h21ff451_1        https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    vs2015_runtime:  14.27.29016-h5e58377_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    wheel:           0.41.2-py38haa95532_0  https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main

Proceed ([y]/n)? y

ca-certificate 100% |#######################################################################| Time: 0:00:00 848.38 kB/s
libffi-3.4.4-h 100% |#######################################################################| Time: 0:00:00   4.78 MB/s
openssl-1.1.1w 100% |#######################################################################| Time: 0:00:00   9.53 MB/s
python-3.8.18- 100% |#######################################################################| Time: 0:00:01  11.69 MB/s
setuptools-68. 100% |#######################################################################| Time: 0:00:00  11.87 MB/s
wheel-0.41.2-p 100% |#######################################################################| Time: 0:00:00  14.51 MB/s
pip-23.3.1-py3 100% |#######################################################################| Time: 0:00:00  11.78 MB/s
#
# To activate this environment, use:
# > activate vsgEnv
#
# To deactivate an active environment, use:
# > deactivate
#
# * for power-users using bash, you must source
#

# 激活虚拟环境
C:\Users\ychen\Downloads\video-subtitle-generator (main -> origin)
λ activate vsgEnv

# 安装依赖
C:\Users\ychen\Downloads\video-subtitle-generator (main -> origin)
(vsgEnv) λ pip install -r requirements.txt
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting appdirs==1.4.4 (from -r requirements.txt (line 1))
  Downloading https://mirrors.aliyun.com/pypi/packages/3b/00/2344469e2084fb287c2e0b57b72910309874c3245463acd6cf5e3db69324/appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Collecting audioread==2.1.9 (from -r requirements.txt (line 2))
  Downloading https://mirrors.aliyun.com/pypi/packages/b3/d1/e324634c5867a668774d6fe233a83228da4ba16521e19059c15df899737d/audioread-2.1.9.tar.gz (377 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 377.5/377.5 kB 1.6 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done

# 识别文字
C:\Users\ychen\Downloads\video-subtitle-generator (main -> origin)
(vsgEnv) λ python gui.py
选择识别的语言: 中文(简体)
选择识别模式: 标准
1536 864
Process SpawnPoolWorker-2:
Process SpawnPoolWorker-4:
Process SpawnPoolWorker-1:
Process SpawnPoolWorker-3:
Process SpawnPoolWorker-5:

案例2

安装依赖

pip install SpeechRecognition

下载swig，解压后配置环境变量
打开cmd，验证

C:\Users\ychen>swig -version

SWIG Version 4.0.2

Compiled with i686-w64-mingw32-g++ [i686-w64-mingw32]

Configured options: +pcre

Please see http://www.swig.org for reporting bugs and further information

离线安装pocketsphinx
下载本地python对应版本
将下载的whl文件移动到venv\Scripts目录下，执行入下命令

C:\ProgramData\Anaconda3\Scripts>pip install pocketsphinx-0.1.15-cp36-cp36m-win_amd64.whl
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Processing c:\programdata\anaconda3\scripts\pocketsphinx-0.1.15-cp36-cp36m-win_amd64.whl
Installing collected packages: pocketsphinx
Successfully installed pocketsphinx-0.1.15

下载中文语言/声学模型

普通话是新版本，存档是老版本，下载其中一个
把解压出来的zh_cn.cd_cont_5000文件夹重命名为acoustic-model、zh_cn.lm.bin命名为language-model.lm.bin、zh_cn.dic中dic改为dict格式，zh_cn替换为pronounciation-dictionary
最后把这个三个文件放在zh-CN文件夹中

测试，音频识别准确率非常低
代码

点击查看代码

import speech_recognition as sr
def wav2txt():
    r = sr.Recognizer()
    # 打开语音文件
    with sr.AudioFile('./test.wav') as source:
        audio = r.record(source)
    # print('文本内容: ', r.recognize_sphinx(audio, language='zh-CN'))  # 汉语
    try:
        print('文本内容: ', r.recognize_sphinx(audio, language='zh-CN'))  # 汉语
        # print('文本内容: ', r.recognize_sphinx(audio))  # 英语
    except Exception as e:
        print('报错了，报错内容---',e)
 
wav2txt()

posted @ 2024-03-02 20:42 DogLeftover 阅读(135) 评论(0) 收藏举报

刷新页面返回顶部