Fire + Ice: David Pallmann's Technology Blog: CIA World Factbook on AWS, Part 3: Alexa Voice Interface using Lambda and DynamoDB

In this 3-part series, I'm showing how you can take CIA World Factbook data and use it for your own purposes on Amazon Web Services. Previously in Part 1 we created the back-end to collect data and store it in DynamoDB and S3 storage, using a Lambda Function to insert document records. In Part 2 we created a Lambda API for data access and a web site for browsing, searching, and viewing charts. Today in Part 3 we're creating an Alexa Skill so that the world country data can be accessed by voice.

What We're Building

Unless you've been living under a rock, you know that Amazon's cloud-based voice service is named Alexa and can be accessed from... well, anywhere: devices like the Amazon Echo and Dot; on your TV via Amazon FireStick; in a variety of cars; and many other places. Even if you don't own an Alexa-enabled device, you can get to right now on the web at https://alexa.amazon.com; or from your phone using the Alexa app.

You can access the country data skill with "Alexa, Open World Country Data".

We are going to create an Alexa Skill (voice app) named World Country Data, backed by a Lambda Function that responds to spoken inquiries. The Lambda Function will query the DynamoDB database to retrieve country data.

World County Data Dialog

The Alexa skill will be able to respond to inquiries like these:

What is the population of country?
Where is country?
How large is country?
What language is spoken in country?
What are the major cities in country?
What countries border country?
What are the natural resources of country?
What are the agricultural products of country?
What are the industries of country?
Give me an overview of country
Brief me on the economy of country
How many mobile phones are in country?

In addition, a user can also inquire about how countries compare and rank:

Wha country is the largest?
Which countries have the highest exports?
What countries have the most Internet users?
What country has the lowest population?

We also want the ability to play an audio clip:

Play the national anthem of country

Here's what happens architecturally: spoken inquires from users to Alexa undergo speech recognition, machine learning, and natural language processing. Alexa recognizes an intent and passes it to our Lambda Function. Factbook skill looks up the required data from the DynamoDB which contains JSON records for each country. The function returns a speech response, which in some cases may reference an audio clip. For Alexa devices with a display, the response also includes a card, which displays has a map or flag image (depending on device size). Audio clips and images are stored in S3.

Achitecture of World Country Data Skill

Our project will consist of a Alexa Skill (configuration) and a Lambda Function to go with it (code). We'll start with the Alexa skill. Our starting point for the project was the Color Picker sample in Node.js, which provides basic skeleton code for a skill.

Alexa Skill

To avoid potential confusion, I should mention that as I started work on this leg of the project, it came to my attention that there is already an existing World Factbook skill (not from me). Don't confuse that with my skill which is named World Country Data. The two skills are pretty different in scope, however.

Alexa Skills are created in the Amazon developer portal, developer.amazon.com, not the AWS console. You'll need to register as a developer. Our skill is named World Country Data.

The first area to set up in the skill project are the intents. An intent is something a user is trying to communicate, such as What's the population of {country} and contains a list of utterances. Intents are easy and not easy at the same time: it's simple enough to enter some utterances for the intent and try it out on Alexa. What's not so easy is thinking through all the possible ways someone might phrase an inquiry.

Scalar Value Inquiries

One category of intents we'll have is inquires about scalar values, such as What's the population of {country}?, that return a single value (in this case, a number). Here's our definition for the population intent:

The embedded value {country} is called a slot, which is a type of placeholder. We want to be able to ask these questions about any country. Further down in the intent, we see that the slot country is of type country, meaning a custom list of country names we will create.

There have been a number of places in this project where it's been necessary to list 260 individual countries. Defining the possible slot values for country is one of them. My fingers are getting a good work out! Technically speaking, the intents would work without listing every possible country here; however, there's a huge improvement in Alexa's understanding and selecting the right intent when the expected slot values are defined.

Defining country Slot Values

The response to this intent will be provided by our Lanbda Function, which we'll get to later in this post. We create similar intents for area, climate, terrain, literacy rate, and the number of phones. This hardly scratches the surface of the available data; we'll come back and do more someday.

Text Narratives

Some fields in the country JSON contain paragraph text. For example, there's introduction.background, a background preamble on the country; and economy.overview, a brief on a country's economy. Technically, retrieving these items is no different than the scalar values described in the prior section; the experience to the user is quite different, however, as Alexa will read on and on. We handle these kind of inquires in the same way, with intents.

Alexa Dialog: Country Overview

Lists

Some of the World Factbook data is in the form of lists (JSON object arrays), such as a country's agricultural products or major urban areas.

Alexa Dialog: Major Urban Areas

Our intents for lists are not much different than our intents for scalar values: the only slot needed is the country name. However, our Lambda Function will need different code to retrieve list data.

Intent major_cities: Utterances

Top Countries

In addition to asking facts about a particular country, a user might want to know which countries are ranked highest or lowest in various categories like area, population, exports, or inflation. In Part 2 we created Lamba functions and database queries to get this information and render column charts. Today we can use the same queries (such as Top 10 exports) and give a response like this: The countries with the largest area are Russia, Antarctica, and Canada.

Alexa Dialog: Leading Countries

As before, our intents need to consider a variety of utterances that convey the same meaning:

Intent area_highest: Utterances

We create intents for largest/smallest area, highest/lowest population, highest/lowest inflation rate, and most Internet users. We'll look at the backing Lambda functions later in this post. Once again, there's so much additional data we could mine.

Playing National Anthems

The CIA World Factbook data includes audio clips for most countries' national anthems. Being able to play this audio in the skill is one of my favorite features.

To support this in our skill took some work, because Alexa imposes some limitations on audio clips:

Audio clips must be hosted at an HTTPS endpoint.
The MP3 must be an MPEG version 2 file.
The audio file cannot be longer than 240 seconds.
The bit rate must be 48 kbps.
The sample rate must be 22050Hz, 24000Hz, or 16000Hz.

These are not great specs for music audio, but we have no choice if we are going to be able to play our clips in our Alexa skill.

After obtaining the audio clips for each country, it was necessary to transform them to meet the above specifications. Following the AWS guidance on audio conversion, the following steps were performed for each national anthem MP3 file, using a free audio tool named Audacity.

Open the nation anthem MP3.
Change the Audactity projects' sampling rate to 16000.
Export the audio to MP3, setting a bit rate of 48kbps.

Converting Audio files to 48 kbps with Audacity

After converting all the country national anthem files, they were uploaded to S3 which is where the Alexa Skill will play them from. We'll see how when we look at the backing Lambda Function.

Here's a sample national anthem clip for Austria.

Lambda Function

Our Alexa Skill depends on a Lambda Function writtein in Node.js. Starting with the Color Picker sample gives us a basic voice app, which we will now turn into Country World Data. Although this is just one Lambda function, it calls into several sub-functions we'll need to review.

onIntent

The onIntent function responds to an intent by calling an appropriate handler function based on the intent name:

Scalar value requests and text briefing requests are handled by the lookup function.
List value requests are handled by the lookupList function.
Top countries in a category requests are handled by the lookupTop function.

/**
 * Called when the user specifies an intent for this skill.
 */
function onIntent(intentRequest, session, callback) {
    console.log(`onIntent requestId=${intentRequest.requestId}, sessionId=${session.sessionId}`);

    const intent = intentRequest.intent;
    const intentName = intentRequest.intent.name;

    if (intentName === 'about')                 { lookup('introduction.background', '{value}{end}', intent, session, callback); } 
    else if (intentName === 'area')             { lookup('geography.area.total.value', '{country} is {value} square kilometers in size.{end}', intent, session, callback); }  
    else if (intentName === 'climate')          { lookup('geography.climate', '{value}{end}', intent, session, callback); } 
    else if (intentName === 'count_mobile_phones') { lookup('communications.telephones.mobile_cellular.total_subscriptions', 'There are {value} mobile phones in {country}.{end}', intent, session, callback); } 
    else if (intentName === 'count_land_phones') { lookup('communications.telephones.fixed_lines.total_subscriptions', 'There are {value} land lines in {country}.{end}', intent, session, callback); } 
    else if (intentName === 'economy')          { lookup('economy.overview', '{value}{end}', intent, session, callback); } 
    else if (intentName === 'terrain')          { lookup('geography.terrain', '{value}{end}', intent, session, callback); } 
    else if (intentName === 'where')            { lookup('geography.location', '{value}{end}', intent, session, callback); }  
    else if (intentName === 'population')       { lookup('people.population.total', 'The population of {country} is {value}{end}.', intent, session, callback); } 
    else if (intentName === 'urban_population') { lookup('people.urbanization.urban_population.value', 'The urban population of {country} is {value} percent.{end}', intent, session, callback); } 
    else if (intentName === 'literacy_rate')    { lookup('people.literacy.total_population.value', 'The literacy rate of {country} is {value} percent.{end}', intent, session, callback); } 

    else if (intentName === 'play_national_anthem') { lookup(null, '{audio}{end}', intent, session, callback); } 
    
    else if (intentName === 'agricultual_products') { lookupList('economy.agriculture_products.products', null, '{country} has these agricultural products: {list}.{end}', intent, session, callback); } 
    else if (intentName === 'bordered_by')      { lookupList('geography.land_boundaries.border_countries', 'country', '{country} is bordered by: {list}.{end}', intent, session, callback); } 
    else if (intentName === 'industries')       { lookupList('economy.industries.industries', null, '{country} has these industries: {list}.{end}', intent, session, callback); } 
    else if (intentName === 'languages')        { lookupList('people.languages.language', 'name', 'In {country}, these languages are spoken: {list}.{end}', intent, session, callback); } 
    else if (intentName === 'major_cities')     { lookupList('people.major_urban_areas.places', 'place', 'In {country}, the major urban areas are: {list}.{end}', intent, session, callback); } 
    else if (intentName === 'resources')        { lookupList('geography.natural_resources.resources', null, '{country} has these natural resources: {list}.{end}', intent, session, callback); } 

    else if (intentName === 'area_highest')     { lookupTop('report-area-highest', 'The countries with the largest area are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'area_lowest')      { lookupTop('report-area-lowest', 'The countries with the smallest area are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'exports_highest')  { lookupTop('report-exports-highest', 'The countries with the highest exports are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'exports_lowest')   { lookupTop('report-exports-lowest', 'The countries with the lowest exports are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'imports_highest')  { lookupTop('report-imports-highest', 'The countries with the highest imports are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'imports_lowest')   { lookupTop('report-imports-lowest', 'The countries with the lowest imports are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'internet_users_most') { lookupTop('report-internet-users-highest', 'The countries with the most Internet users are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 

    else if (intentName === 'population_highest') { lookupTop('report-population-highest', 'The countries with the highest population are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'population_lowest') { lookupTop('report-population-lowest', 'The countries with the lowest population are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 

    else if (intentName === 'AMAZON.HelpIntent') { help(intent, session, callback); } 
    else if (intentName === 'AMAZON.StopIntent' || intentName === 'AMAZON.CancelIntent') { handleSessionEndRequest(callback); }
    else if (intentName === 'AMAZON.FallbackIntent') { huh(intent, session, callback); }

    else { throw new Error('Invalid intent: ' + intentName); }
}

onIntent function

Now let's take a look at our three primary handlers: lookup, lookupList, and lookupTop.

lookup

lookup is called to get a scalar value, such as the people.population.total value in the JSON below.

JSON snippet - Total Population for Greece

We can call the lookup function with the following path to retrieve the population total:

lookup('people.population.total', 'The population of {country} is {value}.{end}', intent, session, callback);

The lookup function is passed a JSON document path; a response template; and intent, session, and callback variables. The intent can be inspected, the session can be used to store or retrieve session state, and the callback is invoked to return a response. Here's the code to lookup:

// --------------- lookup - look up a value - e.g. What is the {property} of {country}? | What is the population of France?
// path: a dotted path to the data, as in people.population.total

function lookup(path, response, intent, session, callback) {

    const repromptText = 'Please ask me a country fact, such as "What is the population of France"?';
    const sessionAttributes = {};
    let shouldEndSession = false;
    let speechOutput = '';
    FlagImageUrl = null;
    MapImageUrl = null;
    let cardTitle = 'World Country Data';
    
    let country = normalizeCountryName(intent.slots.country.value);
    
    if (!inCountryList(country)) { // also sets FlagImageUrl & MapImageUrl if country name recognized
        console.log('lookup: country not recognized: ' + country);
        cardTitle = 'World Country Data - Country Not Recognized';
        speechOutput = 'Sorry, I don\'t recognize that country name. ' + Tag;
        callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
    }
    else {
        cardTitle = 'World Country Data - ' + country;
        const AWS = require('aws-sdk');
        AWS.config.update({region: 'us-east-1'});
        const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 
    
        var params = {
          TableName: 'factbook',
          ExpressionAttributeNames: {
             '#name': 'name',
             '#source': 'source'
          },
          ExpressionAttributeValues: {
            ':name': country,
            ':source': 'Factbook'
          },
          KeyConditionExpression: '#name = :name and #source = :source'
        };
        
        docClient.query(params, function(err, data) {
    
            if (err) { 
                console.log('lookup: Query Error - ' + err.toString());
                speechOutput = 'Sorry, an error occurred looking up the data.';
                shouldEndSession = true;
            } else { 
                if (!data || data.Items.length===0) {
                    speechOutput = 'Sorry, I found no data. ' + Tag;
                }
                else {
                    try {
                        response = replace(response, '{country}', country);
                        response = replace(response, '{value}', eval('data.Items[0].' + path));
                        response = replace(response, '{end}', Tag);
                        response = replace(response, '{audio}', '<audio src="https://s3.amazonaws.com/factbookaudio/' + replace(country, ' ', '+') + '.mp3"/>');
                        speechOutput = response;
                    }
                    catch(e) {
                        console.log('lookup: Exception - ' + e.toString());
                        speechOutput = "Sorry, I had a problem looking that up.";
                        shouldEndSession = true;
                    }
                }
                    
            // Setting repromptText to null signifies that we do not want to reprompt the user.
            // If the user does not respond or says something that is not understood, the session
            // will end.
            callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
            }
        });
    }
}

lookup function

The above code calls normalizeCountryName to normalize the country name (line 14). Many countries have alternate or historical names a user might refer to, and we need to arrive at a standard name that will match the JSON record in DynamoDB and the filenames in S3. For example, a user can request data on America, the U.S., or the U.S.A. and the country name will be normalized to United States.

Next (lines 16-21) we check whether the country name is in our list of country names. If it isn't, we won't be able to retrieve the requested data and we respond Sorry, I don't recognize that country name. The response speech is routed back by calling buildSpeechletResponse, which we'll review later.

If we do have a recognized country name, it's time to retrieve the requested data. A DynamoDB client is instantiated (lines 24-26), query parameters are set up to retrieve the JSON country record (lines 28-39), and the JSON document is retrieved (lines 41-71). The code uses JavaScript's eval function to get to the data, which isn't a great practice; we'll be looking to rewrite that code in the near future and not use eval.

We've been passing spoken responses and referring to them as templates; this is something I created, not an Alexa feature. To return the spoken response, the template response text passed in has values replaced:

{country} is replaced with the normalized country name;
{value} is replaced with the desired value by evaluating the property path that was passed in to the function.
{audio} is replaced with markup for the national anthem sound clip.
{end} is replaced with a 1-second pause and "What else would you like to know?" prompt.

The response is passed back to Alexa by calling buildSpeechletResponse (line 69).

The response to What is the population of Greece? is The population of Greece is 10 million 761 thousand 523.

lookupList

lookupList is very similar to lookup, but has different code for retrieving the value. Instead of a single property, an array must be iterated through with an inner property extracted to speak. Here's an example of the JSON array for languages in the JSON record for Spain. In this case, the path to the array is people.languages.language but the property to be extracted from each array element is name.

We can call lookupList with the following parameters to extract a list of languages:

lookupList('people.languages.language', 'name', 'In {country}, these languages are spoken: {list}.{end}', intent, session, callback);

Here's the code to lookupList:

// --------------- lookupList - look up a list - e.g. What languages are spoken in {country}? | What ethnic groups live in France?

// path: a dotted path to array data, as in people.languages.language
// property: property of a list item to vocalize - ex: name

function lookupList(path, property, response, intent, session, callback) {

    const repromptText = 'Please ask me a country fact, such as "What is the population of France"?';
    const sessionAttributes = {};
    let shouldEndSession = false;
    let speechOutput = '';
    FlagImageUrl = null;
    MapImageUrl = null;
    let cardTitle = 'World Country Data';
    
    let country = normalizeCountryName(intent.slots.country.value);
    
    if (!inCountryList(country)) { // also sets FlagImageUrl & MapImageUrl if country name recognized
        console.log('lookupList: country not recognized: ' + country);
        cardTitle = 'World Country Data - Country Not Recognized';
        speechOutput = 'Sorry, I don\'t recognize that country name. ' + Tag;
        callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
    }
    else {
        cardTitle = 'World Country Data - ' + country;
        const AWS = require('aws-sdk');
        AWS.config.update({region: 'us-east-1'});
        const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 
    
        var params = {
          TableName: 'factbook',
          ExpressionAttributeNames: {
             '#name': 'name',
             '#source': 'source'
          },
          ExpressionAttributeValues: {
            ':name': country,
            ':source': 'Factbook'
          },
          KeyConditionExpression: '#name = :name and #source = :source'
        };
        
        docClient.query(params, function(err, data) {
    
            if (err) { 
                console.log('lookupList: Query Error - ' + err.toString());
                speechOutput = 'Sorry, an error occurred looking up the data.';
                shouldEndSession = true;
            } else { 

                if (!data || data.Items.length===0) {
                    speechOutput = 'Sorry, I found no data. ' + Tag;
                }
                else {
                    try {
                        var listText = '';
                        var listData =  eval('data.Items[0].' + path);
                        if (listData != '') {
                            for (var i = 0; i < listData.length; i++) {
                                if (i > 0) {
                                    listText += ", ";
                                }
                                if (property===null) {
                                    listText += listData[i];
                                }
                                else {
                                    listText += listData[i][property];
                                }
                            }
                        }
                        
                        response = replace(response, '{country}', country);
                        response = replace(response, '{list}', listText);
                        response = replace(response, '{end}', Tag);
                        response = replace(response, '{audio}', '<audio src="https://s3.amazonaws.com/factbookaudio/' + replace(country, ' ', '+') + '.mp3"/>');
                        speechOutput = response;
                    }
                    catch(e) {
                        console.log('lookupList: Exception - ' + e.toString());
                        speechOutput = "Sorry, I had a problem looking that up. ";
                        shouldEndSession = true;
                    }
                }
                    
            // Setting repromptText to null signifies that we do not want to reprompt the user.
            // If the user does not respond or says something that is not understood, the session
            // will end.
            callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
            }
        });
    }
}

lookupList function

lookupList is nearly identical to lookup. It also normalizes and validates the country name, and retrieves the JSON country record from DynamoDB. The big difference is lines 56-70 where the code iterates through the array at path and extracts variable property from each array element to form a list to speak. The response template can specify {list} as a placeholder for the list.

The spoken response to What languages are spoken in Spain? is In Spain, these languages are spoken: Castilian Spanish, Catalan, Galician, Basque, Aranese along with Catalan, speakers.

listTop

When a user asks a country ranking question like "which countries are largest?", the listTop function handles the request. listTop runs a query to get a Top 10 country list, such as the 10 countries with largest area. These are the same queries we used to get column charts in the web site in Part 2.

We can call lookupTop with the following parameters to hear the top countries in a category:

lookupTop('report-area-highest', 'The countries with the largest area are {country1}, {country2}, and {country3}.{end}', intent, session, callback);

Here's the code to listTop. The report name parameter is used to set the index, projection expression, and sort direction for the DynamoDB query. The response template may specify {country1}, {country2}, and {country3} for the names of the top 3 countries in the result.

// --------- lookupTop - look up top (leading) countries for a report - e.g. which country has the highest exports?: which countries are biggest?
// report: report name, such as 'report-exports-highest'
// response: speach to output, which may include embeddeed placeholders {country1}, {country2}, {country3}

function lookupTop(report, response, intent, session, callback) {

    const repromptText = 'Please ask me a country fact, such as "What is the population of France"?';
    const sessionAttributes = {};
    let shouldEndSession = false;
    let speechOutput = '';

    var index = null;
    var projectionExpression = null;
    var scanIndexForward = true;
    
    var FlagImageUrl = null;
    var MapImageUrl = null;
    const cardTitle = 'World Country Data';
    
    switch(report) {
        case 'report-area-highest':
            index = 'rank-area-index';
            projectionExpression = "#name, global_rank_area, global_value_area";
            scanIndexForward = true;
            break;
        case 'report-area-lowest':
            index = 'rank-area-index';
            projectionExpression = "#name, global_rank_area, global_value_area";
            scanIndexForward = false;
            break;
        case 'report-exports-highest':
            index = 'rank-exports-index';
            projectionExpression = "#name, global_rank_exports, global_value_exports";
            scanIndexForward = true;
            break;
        case 'report-exports-lowest':
            index = 'rank-exports-index';
            projectionExpression = "#name, global_rank_exports, global_value_exports";
            scanIndexForward = false;
            break;
        case 'report-imports-highest':
            index = 'rank-imports-index';
            projectionExpression = "#name, global_rank_imports, global_value_imports";
            scanIndexForward = true;
            break;
        case 'report-imports-lowest':
            index = 'rank-imports-index';
            projectionExpression = "#name, global_rank_imports, global_value_imports";
            scanIndexForward = false;
            break;
        case 'report-internet-users-highest':
            index = 'rank-internet-users-index';
            projectionExpression = "#name, global_rank_internet_users, global_value_internet_users";
            scanIndexForward = true;
            break;
        case 'report-population-highest':
            index = 'rank-population-index';
            projectionExpression = "#name, global_rank_population, global_value_population";
            scanIndexForward = true;
            break;
        case 'report-population-lowest':
            index = 'rank-population-index';
            projectionExpression = "#name, global_rank_population, global_value_population";
            scanIndexForward = false;
            break;
        default:
            report = null;
            break;
    }
    
    if (report===null) {
        speechOutput = "Sorry, I could not find that data. ";
        shouldEndSession = true;
        callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
    }
    else {
        const AWS = require('aws-sdk');
        AWS.config.update({region: 'us-east-1'});
        const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 
    
        var params = {
          TableName: 'factbook',
          IndexName: index,
          ExpressionAttributeNames: {
             '#name': 'name',
             '#source': 'source'
          },
          ExpressionAttributeValues: {
            ':source': 'Factbook',
          },
          KeyConditionExpression: '#source = :source',
          ProjectionType : "ALL",
          ProjectionExpression: projectionExpression,
          Limit: 10,
          ScanIndexForward: scanIndexForward
        };
        
        docClient.query(params, function(err, data) {
    
            if(err) { 
                console.log('lookupTop: Error - ' + err.toString());
                 speechOutput = 'Sorry, an error occurred looking that up.';
                 shouldEndSession = true;
            } else { 
                if (!data || data.Items.length===0) {
                    speechOutput = 'Sorry, I found no data. ' + Tag;
                }
                else {
                    try {
                        response = replace(response, '{country1}', data.Items[0].name);
                        response = replace(response, '{country2}', data.Items[1].name);
                        response = replace(response, '{country3}', data.Items[2].name);
                        response = replace(response, '{end}', Tag);
                        speechOutput = response;
                    }
                    catch(e) {
                        console.log('lookupTop: Exception - ' + e.toString());
                        speechOutput = "Sorry, I had a problem looking that up.";
                    }
                }
                    
            // Setting repromptText to null signifies that we do not want to reprompt the user.
            // If the user does not respond or says something that is not understood, the session
            // will end.
            callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
            }
      });
    }
}

listTop Function

The response to Which countries are largest? is The countries with the largest area are Russia, Antarctica, and Canada.

buildSpeechletResponse

We've made several references to a buildSpeechletResponse function, which assembles the response to send back to Alexa. Here's the code to buildSpeechletResponse:

// --------------- Helpers that build all of the responses -----------------------

function buildSpeechletResponse(title, output, repromptText, shouldEndSession) {
    
    var outputSpeech = null;
    
    var cardText = replace(output, '<break time="1s"/>', '\r\n\r\n');
    var pos = cardText.indexOf('<audio');
    if (pos != -1) cardText = 'Playing National Anthem\r\n\r\nWhat else would you like to know?';
    
    if (output.indexOf('<') != -1) {
        outputSpeech = {   // output contains markup (audio, breaks) - output SSML
            type: 'SSML',
            ssml: '<speak>' + output + '</speak>',
        };   
    }
    else {
        outputSpeech = {  // output is just text
            type: 'PlainText',
            text: output
        };
    }

    return {
        outputSpeech: outputSpeech,
        card: {
            type: 'Standard',
            title: `${title}`,
            text: cardText,
            content: `SessionSpeechlet - ${output}`,
            image: {
                "smallImageUrl": FlagImageUrl,
                "largeImageUrl": MapImageUrl
            }
        },
        reprompt: {
            outputSpeech: {
                type: 'PlainText',
                text: repromptText,
            },
        },
        shouldEndSession,
    };
}

buildSpeechletResponse Function

The above code assembles a repsonse that includes an outputSpeech object. In the original sample used as a starting point for this project, outSpeech was just text with a type of 'PlainText'. However, our responses sometimes include <break time="1s"> markup to pause a second; and <audio ...=""> tags to play national anthems. For these reason our response is of type SSML

Certification

Having put the work into creating this skill, I decided to submit it for certification so anyone could use it. This was my first time going through the certification process, and I was pleased to find it a smooth process.

The first step was to polish the voice app as much as I could. Alexa voice apps are easy to get started, but getting them to a good production-ready state is another thing; it requires thinking through all the different ways someone might express an intent and a lot of testing. In particular, it requires getting input from multiple people since you won't think of everything yourself. After convincing my family to assist me with testing, I felt I was ready for my first submission.

The Amazon developer console takes you through a Distribution area for describing your app and answering some questions about it; you can then run a validation and automated test to pick up some low-hanging fruit about areas that need attention. When you're past all that you can submit for certification, then sit back and await feedback email. I submitted my first attempt on a Sunday evening and when I went to my computer Monday morning feedback was waiting for me. There were just two very reasonable issues to address, explained in a helpful way.

One issue had to do with my responses. I'd designed the app to stay open until you expressly tell it to exit; the reasoning being that if you're getting country facts, you probably want to ask a series of questions. The feedback said I could only do that if my responses prompted the user for something more. That was easy to address: now when a question is answered, there's a one-second pause followed by "What else would you like to know?"

The second issue had to do with cards, which is what Alexa displays when used from a device with a display (such as my TV with Amazon FireStick). The default sample app I used as a starting point output very technical titles like 'SpeeachApplet'. I overhaued the card output, and now a card is display that includes a friendly-worded title; a map of the country being described (or its flag on small-size display); and the text of the response. Here for example is the card displayed in response to Where is French Polynesia?

Card Response to "Where is French Polynesia?"

I resubmitted Monday mid-morning, and pass certification overnight. All in all, a satisfying certification process. Here's what the skill listing looks like on Amazon.com:

In Conclusion

In this series, I showed how public-domain data from the CIA World Factbook can be hosted on Amazon Web Services. After bringing that data into DynamoDB and creating a Lambda Function serverless API, and then creating a web site, we tackled an Alexa Skill in this final part of the series. You can access the skill with "Alexa, Open World Country Data".

Our skill was able to answer questions about specific values for a country, such as its population or area; as well as lists such as the languages spoken in a country. Country ranking inquiries can also be made, such as which countries have the lowest exports or highest inflation. Although we covered many data points, there's a great deal more that can be mined out of this data source.

Tuesday, February 26, 2019

CIA World Factbook on AWS, Part 3: Alexa Voice Interface using Lambda and DynamoDB