1. 程式人生 > >【聊天機器人篇】--聊天機器人從初始到應用

【聊天機器人篇】--聊天機器人從初始到應用

tab python版本 級別 star int log ror spa import

一、前述

維基百科中的機器人是指主要用於協助編者執行大量自動化、高速或機械式、繁瑣的編輯工作的計算機程序或腳本及其所登錄的帳戶。

二、具體

1、最簡單的就是基於Rule-Base的聊天機器人。

也就是計算設計好語料庫的問答語句。 就是小學生級別的 問什麽 答什麽

import random

# 打招呼
greetings = [hola, hello, hi, Hi, hey!,hey]
# 回復打招呼
random_greeting = random.choice(greetings)

# 對於“你怎麽樣?”這個問題的回復
question = [How are you?
,How are you doing?] # “我很好” responses = [Okay,"I‘m fine"] # 隨機選一個回 random_response = random.choice(responses) # 機器人跑起來 while True: userInput = input(">>> ") if userInput in greetings: print(random_greeting) elif userInput in question: print(random_response)
# 除非你說“拜拜” elif userInput == bye: break else: print("I did not understand what you said")

結果:

>>> hi
hey
>>> how are u
I did not understand what you said
>>> how are you
I did not understand what you said
>>> how are you?
I did not understand what you said
>>> How are you? Im fine >>> bye

2、升級I:

顯然 這樣的rule太弱智了,我們需要更好一點的“精準對答”,比如 透過關鍵詞來判斷這句話的意圖是什麽(intents)。

from nltk import word_tokenize
import random

# 打招呼
greetings = [hola, hello, hi, Hi, hey!,hey]
# 回復打招呼
random_greeting = random.choice(greetings)

# 對於“假期”的話題關鍵詞
question = [break,holiday,vacation,weekend]
# 回復假期話題
responses = [It was nice! I went to Paris,"Sadly, I just stayed at home"]
# 隨機選一個回
random_response = random.choice(responses)



# 機器人跑起來
while True:
    userInput = input(">>> ")
    # 清理一下輸入,看看都有哪些詞
    cleaned_input = word_tokenize(userInput)
    # 這裏,我們比較一下關鍵詞,確定他屬於哪個問題
    if  not set(cleaned_input).isdisjoint(greetings):
        print(random_greeting)
    elif not set(cleaned_input).isdisjoint(question):
        print(random_response)
    # 除非你說“拜拜”
    elif userInput == bye:
        break
    else:
        print("I did not understand what you said")
>>> hi
hey
>>> how was your holiday?
It was nice! I went to Paris
>>> wow, amazing!
I did not understand what you said
>>> bye

大家大概能發現,這依舊是文字層面的“精準對應”。現在主流的研究方向,是做到語義層面的對應。比如,“肚子好餓哦”, “飯點到了”,應該表示的是要吃飯了的意思。在這個層面,就需要用到word vector之類的embedding方法,這部分內容 日後的課上會涉及到。

3、升級II:

光是會BB還是不行,得有知識體系!才能解決用戶的問題。我們可以用各種數據庫,建立起一套體系,然後通過搜索的方式,來查找答案。比如,最簡單的就是Python自己的graph數據結構來搭建一個“地圖”。依據這個地圖,我們可以清楚的找尋從一個地方到另一個地方的路徑,然後作為回答,反饋給用戶。

# 建立一個基於目標行業的database
# 比如 這裏我們用python自帶的graph
graph = {上海: [蘇州, 常州],
         蘇州: [常州, 鎮江],
         常州: [鎮江],
         鎮江: [常州],
         鹽城: [南通],
         南通: [常州]}

# 明確如何找到從A到B的路徑
def find_path(start, end, path=[]):
    path = path + [start]
    if start == end:
        return path
    if start not in graph:
        return None
    for node in graph[start]:
        if node not in path:
            newpath = find_path(node, end, path)
            if newpath: return newpath
    return None
print(find_path(上海, "鎮江"))
[上海, 蘇州, 常州, 鎮江]

同樣的構建知識圖譜的玩法,也可以使用一些Logic Programming,比如上個世紀學AI的同學都會學的Prolog。或者比如,python版本的prolog:PyKE。他們可以構建一種復雜的邏輯網絡,讓你方便提取信息,而不至於需要你親手code所有的信息:

son_of(bruce, thomas, norma)
son_of(fred_a, thomas, norma)
son_of(tim, thomas, norma)
daughter_of(vicki, thomas, norma)
daughter_of(jill, thomas, norma)

4、升級III:

任何行業,都分個前端後端。AI也不例外。我們這裏講的算法,都是後端跑的。那麽, 為了做一個靠譜的前端,很多項目往往也需要一個簡單易用,靠譜的前端。比如,這裏,利用Google的API,寫一個類似鋼鐵俠Tony的語音小秘書Jarvis:我們先來看一個最簡單的說話版本。利用gTTs(Google Text-to-Speech API), 把文本轉化為音頻。

from gtts import gTTS
import os
tts = gTTS(text=您好,我是您的私人助手,我叫小辣椒, lang=zh-tw)
tts.save("hello.mp3")
os.system("mpg321 hello.mp3")

同理,有了文本到語音的功能,我們還可以運用Google API讀出Jarvis的回復:

(註意:這裏需要你的機器安裝幾個庫 SpeechRecognition, PyAudio 和 PySpeech)

import speech_recognition as sr
from time import ctime
import time
import os
from gtts import gTTS
import sys
 
# 講出來AI的話
def speak(audioString):
    print(audioString)
    tts = gTTS(text=audioString, lang=en)
    tts.save("audio.mp3")
    os.system("mpg321 audio.mp3")

# 錄下來你講的話
def recordAudio():
    # 用麥克風記錄下你的話
    r = sr.Recognizer()
    with sr.Microphone() as source:
        audio = r.listen(source)
 
    # 用Google API轉化音頻
    data = ""
    try:
        data = r.recognize_google(audio)
        print("You said: " + data)
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio")
    except sr.RequestError as e:
        print("Could not request results from Google Speech Recognition service; {0}".format(e))
 
    return data

# 自帶的對話技能(rules)
def jarvis():
    
    while True:
        
        data = recordAudio()

        if "how are you" in data:
            speak("I am fine")

        if "what time is it" in data:
            speak(ctime())

        if "where is" in data:
            data = data.split(" ")
            location = data[2]
            speak("Hold on Tony, I will show you where " + location + " is.")
            os.system("open -a Safari https://www.google.com/maps/place/" + location + "/&")

        if "bye" in data:
            speak("bye bye")
            break

# 初始化
time.sleep(2)
speak("Hi Tony, what can I do for you?")

# 跑起
jarvis()
Hi Tony, what can I do for you?
You said: how are you
I am fine
You said: what time is it now
Fri Apr  7 18:16:54 2017
You said: where is London
Hold on Tony, I will show you where London is.
You said: ok bye bye
bye bye

不僅僅是語音前端。包括應用場景:微信,slack,Facebook Messager,等等 都可以把我們的ChatBot給integrate進去。

【聊天機器人篇】--聊天機器人從初始到應用