1. 程式人生 > >STFU: Test Your Voice App Idea in Less Than An Hour

STFU: Test Your Voice App Idea in Less Than An Hour

In digital design, there’s a universal truth emerging: most ideas should be tested as cheaply and as simply as possible (thinking lean or agile). In the case of a voice interface, you wouldn’t want it to end up as annoying as C3PO, would you? You most certainly wouldn’t want to find that out AFTER you made it. Yes you

, George Lucas (hey, I saw Jar-Jar coming).

Ideally, you’ve got to fail and learn as fast as possible. Ed Catmull at Pixar said:

If you aren’t experiencing failure, then you are making a far worse mistake. You are being driven by the desire to avoid it.

Guerrilla Testing a GUI

At Clearleft in 2007, we created Silverback, because the cost of usability testing at the time was prohibitively expensive. Silverback is a neat hack — a quick way to record the screen, audio, webcam, and clicks on a mac during usability testing. It’s for those who value the quick and dirty route to knowledge over academic perfection (that comes later). Action is a great route to learning. You shouldn’t need a lab to test something early on, you need simple solution, a cafe, and members of the public.

Silverback (the software) and Steve (the silverback)

Recently I’ve been thinking about how to do this in the early stages of creating a voice interface. I wanted a method that took me from writing a script for the interface to a conversation between the system and a user as fast as possible. No development effort, no fooling around.

That’s my goal with this article: get you testing an idea for a voice interface, as fast as possible.

Wizard of Oz Testing

Wizard of Oz testing is not new, but it has a renewed importance as we see the rise of voice interfaces. Jeff Kelly refined the method in the late 70s: have humans interact with a system that seems real, where in reality it’s controlled by the ‘Wizard’, somewhere behind the scenes. Using the method, you can figure out how people will use and react to the system, and test if the system has the appropriate responses. If it doesn’t, refine and add as needed.

(I’m sure you can do a better job than Kramer!)

Amazon used this method to create the Amazon Echo. The device was in one room with a user, and the ‘Wizard’ was typing responses back to the user, converted into speech by the Echo prototype. The test participants were none the wiser: they thought were interacting with a smart device instead of a smart fake.

Gathering what people say

This method allowed them to gather ‘utterances’ (in VUI design, whatever your users say). You want to collect words and synonyms that people use, even though their intent is the same (book me a cab vs. get me a taxi). Your system must be able to cope with the wide variety of things that people say, and ideally use the words they use when speaking, in order to be understood.

Designing the flow

The system must also accommodate different paths through the dialogue, this is called the dialogue flow. A user booking a restaurant might say the time they want the table when they first start speaking, another might not. Your system needs to ask the right questions in response to highly variable input. Speech is a blank canvas; most GUI isn’t.

The 1 Hour Test

My kids watch this scene from X-men Apocalypse on repeat. A lot.

You can prototype an application for Amazon’s Echo or Google Home (or anything else) in minutes using this method. I’ve made a little python script to make this very easy: it lets you press shortcut keys to have your mac read aloud phrases from a text file.

The Steps

  1. Script the basics of your VUI. 
    Write something that imagines how a user will interact with your system. Try it out a few different ways: how might the flow of conversation work differently when users give more or less information? You want to hit a ‘Goldilocks Zone’ of detail: cater with some of the variation, but don’t try to cover every eventuality at first.
  2. Copy and paste the lines your system will speak into a text file. Remove the user’s lines. Save it as script.txt
  3. Install Say Wizard from github onto your mac. Copy your script.txt into the same directory. Read the instructions for use.
  4. Run your test! Learn and refine as needed.

(If you want to just get started right away and figure this out for yourself, just do step 3.)

Still here? Let’s break it down in more detail. Here’s a simple example I created and tested in less than a hour. I’m skipping a *lot* of detail about how to design a VUI, craft dialogue flow, moderate tests and analyse them, that’s for another time or source. This is about getting going fast.

1. Scripting the Basics of a VUI

In this scenario, we’ll imagine a skill for the Amazon Echo that’s useful at the office: audio recordings of meetings. (Yes, I’m aware it can’t do this, hey: this is just a tutorial!).

I start out by considering the intentions users might have and the useful information that the system might need to gather from the user. I write a script that covers the same dialogue to start the recording of the room audio, but with some variation around what information the user might offer. Some are just fragments of dialogue where I think things will vary.

The italic titles in brackets are the intents — a consistent name we’ll use in the system for what the user was trying to do, based on their first command.

_______________________

(startRecording)

U: Alexa, record this meeting.
A: Recording started. How long is your meeting?
U: an hour.
A: One hour, OK. Who would you like me to send the recording to?
U: James Box.
A: Thanks. Afterwards I’ll send this recording to them.

(startRecording)

U: Alexa, record this meeting.
A: Recording started. How long is your meeting?
U: don’t know.
A: You can ask me to stop the recording whenever you like. Who would you like me to send the recording to?

(startTimedRecording)

U: Alexa, record this room for 45 minutes.
A: Recording started for 45 minutes. Who would you like me to send the recording to?
U: Me.
A: What’s your full name, me?
U: James Box.
A : Thanks. Afterwards I’ll send this recording to you.

(startTimedRecording)

A: Recording started for 45 minutes. Who would you like me to send the recording to?
U: uhhhh… not sure.
A: No problem. You can find the recording later in dropbox folder for shared recordings.

(stopRecording)

U: Alexa, stop the recording.
A: Recording stopped at 53 minutes. Sending it now.

(cancelTheRecording)

U: Alexa, delete the recording.
A: Are you sure you want to cancel the recording?
U: Yes
A: OK, recording deleted. 
____________________

2. Copy and Paste the Lines Your System Will Speak into a Text File (script.txt)

Simple really. Take out what the user says and the accompanying notes. I slightly regroup the responses into a sensible order that I think will occur in testing.

3. Install Say Wizard Onto Your Mac

There are some tools out there for testing voice interfaces, but none of them get you setup that fast, so I decided to create a messy (but elegant!) hack, in the spirit of Silverback. Credit is due here to Abi Jones at Google (we’re running VUI design workshops together): she’s trained Googlers in the past using Apple’s ’say’ software on the command line. I’ve extended her idea a little bit to make it even easier to create and interact with a fake interface.

  1. Download and unzip this script to your mac
  2. Edit the text file ‘script.txt’ in the unzipped ‘saywizard’ folder with your phrases.
  3. Double-click startSayWizard.command to run a test (if mac security settings prevent this, right-click on it, Choose: Open With > Terminal, and click Open).
  4. Press the relevant key to have your mac say your phrases. Close the window to quit.