Whisper Large-v3 本地安装与使用指南

AI技术分享6个月前更新 AIGC吧
749 views 0 0

Whisper Large-v3OpenAI 开发的一款多任务语音识别和翻译模型。以下是详细的安装和使用步骤。

安装依赖

首先,确保你已经安装了Python3.8及以上版本。然后按照以下步骤安装必要的依赖库。

安装Python依赖库

pip install –upgrade pip
pip install –upgrade git+https://github.com/huggingface/transformers.git accelerate datasets[\audio\]

安装ffmpeg

Whisper模型依赖于ffmpeg来处理音频文件。你可以通过以下命令安装.

Ubuntu 或 Debian:sudo apt update && sudo apt install ffmpeg

Windows(使用 Chocolatey):choco install ffmpeg

macOS(使用 Homebrew):brew install ffmpeg

下载和加载模型

1.导入必要的库:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor

设置设备(CPU 或 GPU):

device = "cuda" if torch.cuda.is_available() else "cpu"
model_id = "openai/whisper-large-v3"

加载模型和处理器:

model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id).to(device)
processor = AutoProcessor.from_pretrained(model_id)

处理和推理

1.加载数据集:
使用Hugging Face的datasets库加载示例数据集:

from datasets import load_dataset
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = dataset[0]["audio"]

2.处理输入音频:
将音频样本转换为模型可以接受的输入特征:

input_features = processor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt").input_features
input_features = input_features.to(device)

3.生成预测:
使用模型生成语音识别结果:

gen_kwargs = {"max_new_tokens": 128, "num_beams": 1, "return_timestamps": False}
pred_ids = model.generate(input_features, **gen_kwargs)
pred_text = processor.batch_decode(pred_ids, skip_special_tokens=True)

print(pred_text)

完整示例代码

以下是完整的代码示例,将上述步骤整合在一起:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
from datasets import load_dataset

# 设置设备(CPU 或 GPU)
device = "cuda" if torch.cuda.is_available() else "cpu"
model_id = "openai/whisper-large-v3"

# 加载模型和处理器
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id).to(device)
processor = AutoProcessor.from_pretrained(model_id)

# 加载数据集
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = dataset[0]["audio"]

# 处理输入音频
input_features = processor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt").input_features
input_features = input_features.to(device)

# 生成预测
gen_kwargs = {"max_new_tokens": 128, "num_beams": 1, "return_timestamps": False}
pred_ids = model.generate(input_features, **gen_kwargs)
pred_text = processor.batch_decode(pred_ids, skip_special_tokens=True)

# 输出结果
print(pred_text)

 

 

 

© 版权声明

相关文章

暂无评论

暂无评论...