1. 程式人生 > >Using the Actions SDK to Develop for the Google Assistant

Using the Actions SDK to Develop for the Google Assistant

In today's post I'm going to show how to develop a simple app for the Google Assistant. For developing this app, I will be using the Actions SDK. My next post will use Dialogflow (formerly api.ai) instead. After reading both posts you hopefully will know enough to decide which approach is better suited for you.

About the app

In my home town I usually take the bus when taking our youngest to kindergarten before heading off to work. In winter and also due to some long-lasting street works, one of the two lines near our home is quite late all the time. But naturally unreliably so. So I wonder every morning when to leave for the bus. We do

have an app for bus information in Münster - but some limitations of this app make it not really suitable for the task.

Thus an assistant app that tells me when the next busses are due and whether we should hurry up or whether we can idle around a bit more is what I'm going to build. The first version in this post with the Actions SDK. Then in the next post an improved version with Dialogflow.

Here's a sketch of what a conversation with this app should look like:

Sample dialog of the app
Sample dialog of the app - Made with botsociety.io

On the right is what I'm going to speak. On the left are the assistant's answers. The first is by the assistant itself - I changed the color for that one to emphasize the difference. The second and third are responses of the app. After the last response the app finishes - and with this also the current assistant's action. Thanks to botsociety.io, you can even see this website showcasing this prototype.

What is the Actions SDK

The Actions SDK is one way to develop apps for the Google Assistant. As the name of the SDK implies you can create actions which basically are speech snippets of your user and your response to them. Let's go a bit more into the base concepts here.

The relevant things you need to know are:

  • Actions
  • Intents
  • Fulfillment

Actions

An action is a wrapper combining an intent and the fulfillment for this intent. For a truly conversational app, where there's a dialog between the user and the assistant, apps have multiple actions. But often it makes sense to create a one-shot app. In this case the app has only one action and answers directly with the response the user is looking for. The sample app would work well as a one-shot app. I only added one more action to make it a better sample for this post.

Intents

We already know intents from Android. And with Actions on Google they are not much different. An intent specifies what the user intends your app to do.

In contrast to Android, though, you are very limited when it comes to intents with the Actions SDK.

Every app has to define one actions.intent.MAIN intent, that is used to start up your app, if no custom intent is better suited to the invocation phrase. Depending on your kind of app, this intent might directly provide the answer and then stop the app. Or it might kind of introduce the app to the user in some way.

For the start of your app you can also define custom intents and which user utterances trigger that intent. You might even add some parameters to that (for example numbers or a date). And if the Assistant detects that one of those phrases was used by the user, it starts up your app using this intent.

If you were to build a more complex app with the Actions SDK, you would use one intent most of the time: actions.intent.TEXT. Meaning: You have one intent that has to deal with nearly all possible user input. And thus your app would have to decide what to do based on the text spoken or typed in by the user.

There are a few more intents available. For example android.intent.OPTION which is useful if you provide a set of options to the user of which she might select one. But when dealing with user input most of the time you would have to deal with actions.intent.TEXT.

This limitation is the reason why I've written in my intro post about the Assistant, that the Action SDK is mostly useful when you are proficient with natural language processing or have a simple project with a command like interface.

Unless you have some language processing super powers it boils down to this:

⇨ Use the Actions SDK for one-shot apps. These are apps that provide the required answer directly after being invoked and then stop. They give you just this one result. A typical example would be setting a timer to ten minutes.

⇨ For all other apps, for those that are really conversational, where there are multiple paths to follow and where you want your user to provide more information during the conversation, I recommend to use Dialogflow (api.ai) instead of the Actions SDK. I will cover Dialogflow in my next post.

Fulfillments

Fulfillment is just another word for your backend. Every action must be backed by some backend of yours. This is where you generate the text spoken to the user. Or — on devices that have a graphical user interface — that's where you create visual cues for your user.

Of course the response is based on which intent was triggered and what additional cues the user gave in her question.

When Google calls your backend, it provides a Json file that contains plenty of information as you will see later in this post. Some of the things you get access to:

  • The spoken text as Google has detected it
  • The intent invoked
  • The parameters to the intent
  • The session id
  • Information about the device's capabilities

In this post I'm only going to make use of some of them, but I'm going to cover more aspects in future posts.

The next picture shows where your app — or more precisely: your fulfillment code — fits into the overall picture. The assistant knows which app to call based on the invocation phrase used by the user. It also provides voice recognition of the user utterances and text to speech transformation of your answers to the user. Your app then takes the recognized text and creates an appropriate response for that.

Action SDK workflow
Action SDK workflow - picture © Google, Inc.

Preparations

When you want to use Actions on Google your account must be prepared for that. You have to do three things:

  1. Create a project within the Actions on Google Dev Console
  2. Install the gactions command line tool
  3. Initialize the project using the gactions tool

In the next paragraphs I'm going to cover those steps.

Create a project within the Action on Google Dev Console

The very first step is to use the Dev Console to create a project. If it's your first project check the terms of the services - since by clicking the "Create Project" button you signify to adhere to them:

Creating a project on the dev console
Creating a project on the Actions on Google Dev Console

Afterwards you have to decide which API to use to build your project:

Actions on Google Api / SDK Selection
Actions on Google Api / SDK Selection

As you can see, there are three prominent ways to develop apps: The Actions SDK or one of the two services Dialogflow or Converse AI. If you click on "Actions SDK" you will see a tiny popup where you can copy a gactions command. So the next step is to install the gactions command line tool.

Install the gactions command line tool

You can download the gactions tool from Google's site. It's a binary file you can store, wherever you see fit. Make sure to make this file executable (if you're using Linux or Mac OS X) and - since you're going to use this tool a lot - I recommend to add it to your PATH.

If you already had this tool installed, you can use the selfupdate option to update it:


gactions selfupdate

That's all that is needed in this step