Skilling Up With Alexa

For the last couple of weeks, Graham, Marcel, Sinem and I have been experimenting with Amazon’s Alexa Echo Dot. An Electric Hockey Puck that uses voice recognition powered by Amazon Alexa voice assistant. In this blog I’d like to explain how one goes about creating their first Alexa skill.

Unboxing

The first thing we need to do after unboxing is we download the Alexa app from respective app store. Follow the instructions to connect it to WiFi. Once connected, Alexa should be ready and listening for requests, questions or commands.

One caveat

By the time writing this blog, if you want to run a custom Alexa skill on your local device, you’ll need to set the device to US English. (That took us some hard googling to find out.) I’ll point you to our Mystery Marcel’s blog here so you can find out more about the difficulties we were facing.

Designing a Voice User Interface

There are some best practices when designing the user interface for Alexa and I’d recommend to follow them in order to deliver a decent user experience. You can find them all in the documentation with examples.

I’ll just list two here that I consider crucial:

  • Make It Clear that the User Needs to Respond - means that after you present options to the user, make sure you ask a question so they know that they are expected to say something

  • Don’t Assume Users Know What to Do - i’ve already mentioned this, basically make sure to give and clearly present the options to the user so they know how to answer / control your Alexa skill

Defining the Voice Interface

Amazon Developer Portal (ADS) is the place where we setup our skill. It’s a separate thing from AWS console and as far as we know, there isn’t a way of updating Alexa Skill programmatically. (Which makes us quite sad considering how much we like automated deployment.)

When creating a skill in the developer portal, first thing we need to define is the Name and the Invocation Name of our skill which is pretty self-explanatory. In our case, we put “Jarvis” into both fields and proceed to the next step.

The Interaction Model

Alexa’s Interaction model consists of 3 elements:

Intent Schema
What is an intent? Intents are actions that users can do with your skill. And intent schema is a simple JSON definition of those intents.

This is the schema for our Jarvis skill:

{
  intents: [
    {
      intent: 'MakeFood',
      slots: [
        {
          name: 'food',
          type: 'AMAZON.Food',
        },
      ],
    },
    {
      intent: 'AnswerFood',
      slots: [
        {
          name: 'food',
          type: 'AMAZON.Food',
        },
      ],
    },
  ],
}

We have two arbitrary intent definitions:

  • MakeFood - this intent is triggered when user asks Jarvis to make food straight away, like saying “Hey Alexa, tell Jarvis to make pancakes.” as we’ll see in the Utterances definition

  • AnswerFood - whereas this intent represents a simple answer to a Jarvis’s question like “pankaces” and can only be triggered if the session has been already initialized as we’ll see later in our code

Slot / (Custom) Slot Types
Think about slot as an intent’s argument. You can have multiple slots for an intent. For both of these intents above we use Amazon’s built-in Slot Type called AMAZON.Food - which is basically some predefined list of food. We could have a custom slot type but then we’d have to list the food options manually. I think that Alexa only uses these definitions to distinguish what intent should be triggered and what to feed into each slot as a value. So it’ll also work with things that are not in the list, like shoes for example.

Utterances
These are what people say to interact with our skill or basically a voice-to-intent mapping if you prefer.

Again, this is the list of utterances for Jarvis:

MakeFood to cook {Food}
MakeFood to make {Food}

AnswerFood I'm thinking {Food}
AnswerFood I'd like {Food}
AnswerFood I want {Food}
AnswerFood {Food}

Once we finish configuring our Interaction Model, we get to the Configuration step - this is where we connect Alexa to our Lambda endpoint.

Build and Host Code

Login to the AWS Console and navigate to AWS Lambda. Click the region drop-down and select either US East (N.Virginia) or EU (Ireland) as Lambda functions for Alexa skills must be hosted in either one of these two.

Our Lambda needs to return a JSON response that looks something like this - this is actually the output we need to get when Jarvis is invoked without a command. I.e. “Alexa, open Jarvis” or “Alexa, ask Jarvis” and so on.

{
  version: '1.0',
  sessionAttributes: {},
  response: {
    outputSpeech: {
      type: 'PlainText',
      text: 'Jarvis can cook food for you, what would you like?',
    },
    reprompt: {
      outputSpeech: {
        type: 'PlainText',
        text: 'What did you say you would like to eat?',
      },
    },
    shouldEndSession: false,
  },
};

The full source code for our Jarvis skill would look like this. (We still need to compile it with babel before shipping it to Lambda, so here is Graham’s blog with a Hello World example if you need some more details on how to do this)

const makeResponse = (text, reprompt = false, shouldEndSession = true) => ({
  version: '1.0',
  sessionAttributes: {},
  response: {
    outputSpeech: {
      type: 'PlainText',

      text,
    },
    reprompt: reprompt
      ? {
          outputSpeech: {
            type: 'PlainText',

            text: reprompt,
          },
        }
      : {},
    shouldEndSession,
  },
});

export const handler = function(event, context, callback) {
  const { type, session } = event.request;

  if (type === 'LaunchRequest') {
    context.succeed(
      makeResponse(
        'Jarvis can cook food for you, what would you like?',
        'What did you say you would like to eat?',
        false,
      ),
    );
  } else if (type === 'IntentRequest') {
    const { intent: { name, slots } } = event.request;

    if (session.name === 'AnswerFood' && !session.new && slots.food) {
      // make your call to a cooking service here

      context.succeed(
        makeResponse(`${slots.food.value}, that's great. I'm on it sir.`),
      );
    } else if (session.name === 'MakeFood' && slots.food) {
      // make your call to a cooking service here

      context.succeed(
        makeResponse(`${slots.food.value}, that's great. I'm on it sir.`),
      );
    } else {
      context.succeed(
        makeResponse(
          'I did not understand your request. For now I can only cook, what would you like to eat?',
          'What did you say you would like to eat?',
          false,
        ),
      );
    }
  } else if (type === 'SessionEndedRequest') {
    context.succeed('Good bye');
  }
};

 

Testing The Skill

Once we have our Lambda live and ready, we’ll go back to the Amazon’s developer portal and fill our Lambda’s id into the form, hit the “Next” button which will bring us into the Testing section. Type down one of our defined utterances and click on “Ask Jarvis” - does it work? If so, your skill should as well be installed on Alexa Echo / Echo dot and you should be able to test it straight away.

Hope you enjoyed this blog and if you are building anything with Alexa, let us know in the comments below.

Roman Schejbal
More from Roman