1. 程式人生 > >speech_recognition實現錄音ffmpeg實現音訊檔案轉換,並用百度語音的sdk實現語音識別

speech_recognition實現錄音ffmpeg實現音訊檔案轉換,並用百度語音的sdk實現語音識別

專案說明:

在windows平臺下,使用speech_recognition記錄音訊,並轉換為16k的wav, 之後利用ffmpeg將wav轉化為pcm檔案,上傳到百度語音端,返回語音資訊,並利用pyttsx3添加了簡單的互動功能。

需求模組:

speech_recognition, pyttsx3, pyaudio, wave, aip, ffmpeg

模組安裝:

程式碼如下

import speech_recognition as sr
import pyttsx3
import pyaudio
import wave
from aip import AipSpeech
import
os # 讀取wav檔案並播放 def read_wav(): CHUNK = 1024 # 測試語音 wf = wave.open('./2.wav', 'rb') # read data data = wf.readframes(CHUNK) p = pyaudio.PyAudio() FORMAT = p.get_format_from_width(wf.getsampwidth()) CHANNELS = wf.getnchannels() RATE = wf.getframerate() print('FORMAT: {} \nCHANNELS: {} \nRATE: {}'
.format(FORMAT, CHANNELS, RATE)) stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, frames_per_buffer=CHUNK, output=True) # play stream (3) while len(data) > 0: stream.write(data) data = wf.readframes(CHUNK) def wav_to_pcm(wav_file): # 假設 wav_file = "音訊檔案.wav" # wav_file.split(".") 得到["音訊檔案","wav"] 拿出第一個結果"音訊檔案" 與 ".pcm" 拼接 等到結果 "音訊檔案.pcm" pcm_file = "%s.pcm" %(wav_file.split(".")[0]) # 就是此前我們在cmd視窗中輸入命令,這裡面就是在讓Python幫我們在cmd中執行命令 os.system("ffmpeg -y -i %s -acodec pcm_s16le -f s16le -ac 1 -ar 16000 %s"%(wav_file,pcm_file)) return pcm_file def get_file_content(filePath): with open(filePath, 'rb') as fp: return fp.read() """ 你的 APPID AK SK """ # 需要根據自己申請的填寫 # APP_ID = '你的 App ID' # API_KEY = '你的 Api Key' # SECRET_KEY = '你的 Secret Key' # 這是測試id,key APP_ID = '14545668' API_KEY = 'BLG4GIxozxXia9U8KKtLBl2j' SECRET_KEY = 'z0ITqlx8OXiveTePBvD7jkSCdGKthZAy' def speech_interaction(): # 初始化pyttsx3 engine engine = pyttsx3.init() # obtain audio from the microphone # 從麥克風記錄資料 r = sr.Recognizer() with sr.Microphone() as source: # print("Say something!") engine.say("門外有客人來訪,需要開門嗎, 請一秒後回答?") engine.runAndWait() r.adjust_for_ambient_noise(source) audio = r.listen(source) engine.say("錄音結束, 識別中") engine.runAndWait() # 將資料儲存到wav檔案中 with open("2.wav", "wb") as f: f.write(audio.get_wav_data(convert_rate=16000)) # 將記錄的語音播放出來 read_wav() # 建立百度語音識別客戶端 client = AipSpeech(APP_ID, API_KEY, SECRET_KEY) # 轉成pcm格式 pcmFile = wav_to_pcm("./2.wav") result = client.asr(get_file_content(pcmFile), 'pcm', 16000, { 'dev_pid': 1537, }) print(result) # print(result['err_msg'], result['result'][0]) # 上傳到百度雲識別 try: success = True if result['err_msg'] == 'success.' else False print(success) if success: text = result['result'][0] if "不" in text : engine.say("好的,那請您自己去開門") engine.runAndWait() elif "開" in text or '好' in text: engine.say("請您稍等,我去幫您開門,") engine.runAndWait() else: engine.say("語音識別錯誤") engine.runAndWait() # engine.say(text) # engine.runAndWait() except Exception as e: engine.say("抱歉, 識別錯誤") engine.runAndWait() # 執行程式碼 speech_interaction()

注意:

pyttsx3的pyttsx3.engine()初始化不能放線上程中進行,會錯。

說明:
  • 如果返回timeout錯誤,在網路暢通的情況下,建議換一個id和key試一下。

專案放在github上了:
https://github.com/MengRe/speech_commmunication/tree/master