speech_recognition實現錄音ffmpeg實現音訊檔案轉換,並用百度語音的sdk實現語音識別
阿新 • • 發佈:2018-11-01
專案說明:
在windows平臺下,使用speech_recognition記錄音訊,並轉換為16k的wav, 之後利用ffmpeg將wav轉化為pcm檔案,上傳到百度語音端,返回語音資訊,並利用pyttsx3添加了簡單的互動功能。
需求模組:
speech_recognition, pyttsx3, pyaudio, wave, aip, ffmpeg
模組安裝:
- speech_recognition: https://pypi.org/project/SpeechRecognition/
- pyttsx3: https://blog.csdn.net/dss_dssssd/article/details/82693742
- pyaudio: https://pypi.org/project/PyAudio/
- aip:https://ai.baidu.com/docs#/ASR-Online-Python-SDK/top
- ffmpeg (Windows下) 注意是系統的環境變數,不是個人的path
https://blog.csdn.net/zhuiqiuk/article/details/72834385
程式碼如下
import speech_recognition as sr
import pyttsx3
import pyaudio
import wave
from aip import AipSpeech
import os
# 讀取wav檔案並播放
def read_wav():
CHUNK = 1024
# 測試語音
wf = wave.open('./2.wav', 'rb')
# read data
data = wf.readframes(CHUNK)
p = pyaudio.PyAudio()
FORMAT = p.get_format_from_width(wf.getsampwidth())
CHANNELS = wf.getnchannels()
RATE = wf.getframerate()
print('FORMAT: {} \nCHANNELS: {} \nRATE: {}' .format(FORMAT, CHANNELS, RATE))
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
frames_per_buffer=CHUNK,
output=True)
# play stream (3)
while len(data) > 0:
stream.write(data)
data = wf.readframes(CHUNK)
def wav_to_pcm(wav_file):
# 假設 wav_file = "音訊檔案.wav"
# wav_file.split(".") 得到["音訊檔案","wav"] 拿出第一個結果"音訊檔案" 與 ".pcm" 拼接 等到結果 "音訊檔案.pcm"
pcm_file = "%s.pcm" %(wav_file.split(".")[0])
# 就是此前我們在cmd視窗中輸入命令,這裡面就是在讓Python幫我們在cmd中執行命令
os.system("ffmpeg -y -i %s -acodec pcm_s16le -f s16le -ac 1 -ar 16000 %s"%(wav_file,pcm_file))
return pcm_file
def get_file_content(filePath):
with open(filePath, 'rb') as fp:
return fp.read()
""" 你的 APPID AK SK """
# 需要根據自己申請的填寫
# APP_ID = '你的 App ID'
# API_KEY = '你的 Api Key'
# SECRET_KEY = '你的 Secret Key'
# 這是測試id,key
APP_ID = '14545668'
API_KEY = 'BLG4GIxozxXia9U8KKtLBl2j'
SECRET_KEY = 'z0ITqlx8OXiveTePBvD7jkSCdGKthZAy'
def speech_interaction():
# 初始化pyttsx3 engine
engine = pyttsx3.init()
# obtain audio from the microphone
# 從麥克風記錄資料
r = sr.Recognizer()
with sr.Microphone() as source:
# print("Say something!")
engine.say("門外有客人來訪,需要開門嗎, 請一秒後回答?")
engine.runAndWait()
r.adjust_for_ambient_noise(source)
audio = r.listen(source)
engine.say("錄音結束, 識別中")
engine.runAndWait()
# 將資料儲存到wav檔案中
with open("2.wav", "wb") as f:
f.write(audio.get_wav_data(convert_rate=16000))
# 將記錄的語音播放出來
read_wav()
# 建立百度語音識別客戶端
client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)
# 轉成pcm格式
pcmFile = wav_to_pcm("./2.wav")
result = client.asr(get_file_content(pcmFile), 'pcm', 16000, {
'dev_pid': 1537,
})
print(result)
# print(result['err_msg'], result['result'][0])
# 上傳到百度雲識別
try:
success = True if result['err_msg'] == 'success.' else False
print(success)
if success:
text = result['result'][0]
if "不" in text :
engine.say("好的,那請您自己去開門")
engine.runAndWait()
elif "開" in text or '好' in text:
engine.say("請您稍等,我去幫您開門,")
engine.runAndWait()
else:
engine.say("語音識別錯誤")
engine.runAndWait()
# engine.say(text)
# engine.runAndWait()
except Exception as e:
engine.say("抱歉, 識別錯誤")
engine.runAndWait()
# 執行程式碼
speech_interaction()
注意:
pyttsx3的pyttsx3.engine()初始化不能放線上程中進行,會錯。
說明:
- 如果返回timeout錯誤,在網路暢通的情況下,建議換一個id和key試一下。
專案放在github上了:
https://github.com/MengRe/speech_commmunication/tree/master