Tuesday, February 26, 2019

CIA World Factbook on AWS, Part 3: Alexa Voice Interface using Lambda and DynamoDB

In this 3-part series, I'm showing how you can take CIA World Factbook data and use it for your own purposes on Amazon Web Services. Previously in Part 1 we created the back-end to collect data and store it in DynamoDB and S3 storage, using a Lambda Function to insert document records. In Part 2 we created a Lambda API for data access and a web site for browsing, searching, and viewing charts. Today in Part 3 we're creating an Alexa Skill so that the world country data can be accessed by voice.


What We're Building

Unless you've been living under a rock, you know that Amazon's cloud-based voice service is named Alexa and can be accessed from... well, anywhere: devices like the Amazon Echo and Dot; on your TV via Amazon FireStick; in a variety of cars; and many other places. Even if you don't own an Alexa-enabled device, you can get to right now on the web at https://alexa.amazon.com; or from your phone using the Alexa app.

You can access the country data skill with "Alexa, Open World Country Data".

We are going to create an Alexa Skill (voice app) named World Country Data, backed by a Lambda Function that responds to spoken inquiries. The Lambda Function will query the DynamoDB database to retrieve country data.

World County Data Dialog

The Alexa skill will be able to respond to inquiries like these:

  • What is the population of country?
  • Where is country?
  • How large is country?
  • What language is spoken in country?
  • What are the major cities in country?
  • What countries border country?
  • What are the natural resources of country?
  • What are the agricultural products of country?
  • What are the industries of country?
  • Give me an overview of country
  • Brief me on the economy of country
  • How many mobile phones are in country?
In addition, a user can also inquire about how countries compare and rank:
  • Wha country is the largest?
  • Which countries have the highest exports?
  • What countries have the most Internet users?
  • What country has the lowest population?
We also want the ability to play an audio clip:
  • Play the national anthem of country
Here's what happens architecturally: spoken inquires from users to Alexa undergo speech recognition, machine learning, and natural language processing. Alexa recognizes an intent and passes it to our Lambda Function. Factbook skill looks up the required data from the DynamoDB which contains JSON records for each country. The function returns a speech response, which in some cases may reference an audio clip. For Alexa devices with a display, the response also includes a card, which displays has a map or flag image (depending on device size). Audio clips and images are stored in S3.

Achitecture of World Country Data Skill

Our project will consist of a Alexa Skill (configuration) and a Lambda Function to go with it (code). We'll start with the Alexa skill. Our starting point for the project was the Color Picker sample in Node.js, which provides basic skeleton code for a skill.

Alexa Skill

To avoid potential confusion, I should mention that as I started work on this leg of the project, it came to my attention that there is already an existing World Factbook skill (not from me). Don't confuse that with my skill which is named World Country Data. The two skills are pretty different in scope, however.

Alexa Skills are created in the Amazon developer portal, developer.amazon.com, not the AWS console. You'll need to register as a developer. Our skill is named World Country Data.

The first area to set up in the skill project are the intents. An intent is something a user is trying to communicate, such as What's the population of {country} and contains a list of utterances. Intents are easy and not easy at the same time: it's simple enough to enter some utterances for the intent and try it out on Alexa. What's not so easy is thinking through all the possible ways someone might phrase an inquiry.

Scalar Value Inquiries

One category of intents we'll have is inquires about scalar values, such as What's the population of {country}?, that return a single value (in this case, a number). Here's our definition for the population intent:


The embedded value {country} is called a slot, which is a type of placeholder. We want to be able to ask these questions about any country. Further down in the intent, we see that the slot country is of type country, meaning a custom list of country names we will create.


There have been a number of places in this project where it's been necessary to list 260 individual countries. Defining the possible slot values for country is one of them. My fingers are getting a good work out! Technically speaking, the intents would work without listing every possible country here; however, there's a huge improvement in Alexa's understanding and selecting the right intent when the expected slot values are defined.

Defining country Slot Values

The response to this intent will be provided by our Lanbda Function, which we'll get to later in this post. We create similar intents for area, climate, terrain, literacy rate, and the number of phones. This hardly scratches the surface of the available data; we'll come back and do more someday.

Text Narratives

Some fields in the country JSON contain paragraph text. For example, there's introduction.background, a background preamble on the country; and economy.overview, a brief on a country's economy. Technically, retrieving these items is no different than the scalar values described in the prior section; the experience to the user is quite different, however, as Alexa will read on and on. We handle these kind of inquires in the same way, with intents.

Alexa Dialog: Country Overview

Lists

Some of the World Factbook data is in the form of lists (JSON object arrays), such as a country's agricultural products or major urban areas.

Alexa Dialog: Major Urban Areas

Our intents for lists are not much different than our intents for scalar values: the only slot needed is the country name. However, our Lambda Function will need different code to retrieve list data.

Intent major_cities: Utterances

Top Countries

In addition to asking facts about a particular country, a user might want to know which countries are ranked highest or lowest in various categories like area, population, exports, or inflation. In Part 2 we created Lamba functions and database queries to get this information and render column charts. Today we can use the same queries (such as Top 10 exports) and give a response like this: The countries with the largest area are Russia, Antarctica, and Canada.

Alexa Dialog: Leading Countries

As before, our intents need to consider a variety of utterances that convey the same meaning:

Intent area_highest: Utterances

We create intents for largest/smallest area, highest/lowest population, highest/lowest inflation rate, and most Internet users. We'll look at the backing Lambda functions later in this post. Once again, there's so much additional data we could mine. 

Playing National Anthems

The CIA World Factbook data includes audio clips for most countries' national anthems. Being able to play this audio in the skill is one of my favorite features.

To support this in our skill took some work, because Alexa imposes some limitations on audio clips:
  • Audio clips must be hosted at an HTTPS endpoint.
  • The MP3 must be an MPEG version 2 file.
  • The audio file cannot be longer than 240 seconds.
  • The bit rate must be 48 kbps.
  • The sample rate must be 22050Hz, 24000Hz, or 16000Hz.
These are not great specs for music audio, but we have no choice if we are going to be able to play our clips in our Alexa skill. 

After obtaining the audio clips for each country, it was necessary to transform them to meet the above specifications. Following the AWS guidance on audio conversion, the following steps were performed for each national anthem MP3 file, using a free audio tool named Audacity.
  1. Open the nation anthem MP3.
  2. Change the Audactity projects' sampling rate to 16000.
  3. Export the audio to MP3, setting a bit rate of 48kbps.
Converting Audio files to 48 kbps with Audacity

After converting all the country national anthem files, they were uploaded to S3 which is where the Alexa Skill will play them from. We'll see how when we look at the backing Lambda Function.

Here's a sample national anthem clip for Austria.

Lambda Function

Our Alexa Skill depends on a Lambda Function writtein in Node.js. Starting with the Color Picker sample gives us a basic voice app, which we will now turn into Country World Data. Although this is just one Lambda function, it calls into several sub-functions we'll need to review.

onIntent

The onIntent function responds to an intent by calling an appropriate handler function based on the intent name:

  • Scalar value requests and text briefing requests are handled by the lookup function.
  • List value requests are handled by the lookupList function.
  • Top countries in a category requests are handled by the lookupTop function.
/**
 * Called when the user specifies an intent for this skill.
 */
function onIntent(intentRequest, session, callback) {
    console.log(`onIntent requestId=${intentRequest.requestId}, sessionId=${session.sessionId}`);

    const intent = intentRequest.intent;
    const intentName = intentRequest.intent.name;

    if (intentName === 'about')                 { lookup('introduction.background', '{value}{end}', intent, session, callback); } 
    else if (intentName === 'area')             { lookup('geography.area.total.value', '{country} is {value} square kilometers in size.{end}', intent, session, callback); }  
    else if (intentName === 'climate')          { lookup('geography.climate', '{value}{end}', intent, session, callback); } 
    else if (intentName === 'count_mobile_phones') { lookup('communications.telephones.mobile_cellular.total_subscriptions', 'There are {value} mobile phones in {country}.{end}', intent, session, callback); } 
    else if (intentName === 'count_land_phones') { lookup('communications.telephones.fixed_lines.total_subscriptions', 'There are {value} land lines in {country}.{end}', intent, session, callback); } 
    else if (intentName === 'economy')          { lookup('economy.overview', '{value}{end}', intent, session, callback); } 
    else if (intentName === 'terrain')          { lookup('geography.terrain', '{value}{end}', intent, session, callback); } 
    else if (intentName === 'where')            { lookup('geography.location', '{value}{end}', intent, session, callback); }  
    else if (intentName === 'population')       { lookup('people.population.total', 'The population of {country} is {value}{end}.', intent, session, callback); } 
    else if (intentName === 'urban_population') { lookup('people.urbanization.urban_population.value', 'The urban population of {country} is {value} percent.{end}', intent, session, callback); } 
    else if (intentName === 'literacy_rate')    { lookup('people.literacy.total_population.value', 'The literacy rate of {country} is {value} percent.{end}', intent, session, callback); } 

    else if (intentName === 'play_national_anthem') { lookup(null, '{audio}{end}', intent, session, callback); } 
    
    else if (intentName === 'agricultual_products') { lookupList('economy.agriculture_products.products', null, '{country} has these agricultural products: {list}.{end}', intent, session, callback); } 
    else if (intentName === 'bordered_by')      { lookupList('geography.land_boundaries.border_countries', 'country', '{country} is bordered by: {list}.{end}', intent, session, callback); } 
    else if (intentName === 'industries')       { lookupList('economy.industries.industries', null, '{country} has these industries: {list}.{end}', intent, session, callback); } 
    else if (intentName === 'languages')        { lookupList('people.languages.language', 'name', 'In {country}, these languages are spoken: {list}.{end}', intent, session, callback); } 
    else if (intentName === 'major_cities')     { lookupList('people.major_urban_areas.places', 'place', 'In {country}, the major urban areas are: {list}.{end}', intent, session, callback); } 
    else if (intentName === 'resources')        { lookupList('geography.natural_resources.resources', null, '{country} has these natural resources: {list}.{end}', intent, session, callback); } 

    else if (intentName === 'area_highest')     { lookupTop('report-area-highest', 'The countries with the largest area are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'area_lowest')      { lookupTop('report-area-lowest', 'The countries with the smallest area are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'exports_highest')  { lookupTop('report-exports-highest', 'The countries with the highest exports are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'exports_lowest')   { lookupTop('report-exports-lowest', 'The countries with the lowest exports are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'imports_highest')  { lookupTop('report-imports-highest', 'The countries with the highest imports are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'imports_lowest')   { lookupTop('report-imports-lowest', 'The countries with the lowest imports are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'internet_users_most') { lookupTop('report-internet-users-highest', 'The countries with the most Internet users are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 

    else if (intentName === 'population_highest') { lookupTop('report-population-highest', 'The countries with the highest population are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'population_lowest') { lookupTop('report-population-lowest', 'The countries with the lowest population are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 

    else if (intentName === 'AMAZON.HelpIntent') { help(intent, session, callback); } 
    else if (intentName === 'AMAZON.StopIntent' || intentName === 'AMAZON.CancelIntent') { handleSessionEndRequest(callback); }
    else if (intentName === 'AMAZON.FallbackIntent') { huh(intent, session, callback); }

    else { throw new Error('Invalid intent: ' + intentName); }
}
onIntent function

Now let's take a look at our three primary handlers: lookup, lookupList, and lookupTop.

lookup


lookup is called to get a scalar value, such as the people.population.total value in the JSON below.

JSON snippet - Total Population for Greece

We can call the lookup function with the following path to retrieve the population total:

lookup('people.population.total', 'The population of {country} is {value}.{end}', intent, session, callback);

The lookup function is passed a JSON document path; a response template; and intent, session, and callback variables. The intent can be inspected, the session can be used to store or retrieve session state, and the callback is invoked to return a response. Here's the code to lookup:
// --------------- lookup - look up a value - e.g. What is the {property} of {country}? | What is the population of France?
// path: a dotted path to the data, as in people.population.total

function lookup(path, response, intent, session, callback) {

    const repromptText = 'Please ask me a country fact, such as "What is the population of France"?';
    const sessionAttributes = {};
    let shouldEndSession = false;
    let speechOutput = '';
    FlagImageUrl = null;
    MapImageUrl = null;
    let cardTitle = 'World Country Data';
    
    let country = normalizeCountryName(intent.slots.country.value);
    
    if (!inCountryList(country)) { // also sets FlagImageUrl & MapImageUrl if country name recognized
        console.log('lookup: country not recognized: ' + country);
        cardTitle = 'World Country Data - Country Not Recognized';
        speechOutput = 'Sorry, I don\'t recognize that country name. ' + Tag;
        callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
    }
    else {
        cardTitle = 'World Country Data - ' + country;
        const AWS = require('aws-sdk');
        AWS.config.update({region: 'us-east-1'});
        const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 
    
        var params = {
          TableName: 'factbook',
          ExpressionAttributeNames: {
             '#name': 'name',
             '#source': 'source'
          },
          ExpressionAttributeValues: {
            ':name': country,
            ':source': 'Factbook'
          },
          KeyConditionExpression: '#name = :name and #source = :source'
        };
        
        docClient.query(params, function(err, data) {
    
            if (err) { 
                console.log('lookup: Query Error - ' + err.toString());
                speechOutput = 'Sorry, an error occurred looking up the data.';
                shouldEndSession = true;
            } else { 
                if (!data || data.Items.length===0) {
                    speechOutput = 'Sorry, I found no data. ' + Tag;
                }
                else {
                    try {
                        response = replace(response, '{country}', country);
                        response = replace(response, '{value}', eval('data.Items[0].' + path));
                        response = replace(response, '{end}', Tag);
                        response = replace(response, '{audio}', '<audio src="https://s3.amazonaws.com/factbookaudio/' + replace(country, ' ', '+') + '.mp3"/>');
                        speechOutput = response;
                    }
                    catch(e) {
                        console.log('lookup: Exception - ' + e.toString());
                        speechOutput = "Sorry, I had a problem looking that up.";
                        shouldEndSession = true;
                    }
                }
                    
            // Setting repromptText to null signifies that we do not want to reprompt the user.
            // If the user does not respond or says something that is not understood, the session
            // will end.
            callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
            }
        });
    }
}
lookup function

The above code calls normalizeCountryName to normalize the country name (line 14). Many countries have alternate or historical names a user might refer to, and we need to arrive at a standard name that will match the JSON record in DynamoDB and the filenames in S3. For example, a user can request data on America, the U.S., or the U.S.A. and the country name will be normalized to United States.

Next (lines 16-21) we check whether the country name is in our list of country names. If it isn't, we won't be able to retrieve the requested data and we respond Sorry, I don't recognize that country name. The response speech is routed back by calling buildSpeechletResponse, which we'll review later.

If we do have a recognized country name, it's time to retrieve the requested data. A DynamoDB client is instantiated (lines 24-26), query parameters are set up to retrieve the JSON country record (lines 28-39), and the JSON document is retrieved (lines 41-71). The code uses JavaScript's eval function to get to the data, which isn't a great practice; we'll be looking to rewrite that code in the near future and not use eval.

We've been passing spoken responses and referring to them as templates; this is something I created, not an Alexa feature. To return the spoken response, the template response text passed in has values replaced:

  • {country} is replaced with the normalized country name;
  • {value} is replaced with the desired value by evaluating the property path that was passed in to the function.
  • {audio} is replaced with markup for the national anthem sound clip. 
  • {end} is replaced with a 1-second pause and "What else would you like to know?" prompt.
The response is passed back to Alexa by calling buildSpeechletResponse (line 69).

The response to What is the population of Greece? is The population of Greece is 10 million 761 thousand 523.

lookupList


lookupList is very similar to lookup, but has different code for retrieving the value. Instead of a single property, an array must be iterated through with an inner property extracted to speak. Here's an example of the JSON array for languages in the JSON record for Spain. In this case, the path to the array is people.languages.language but the property to be extracted from each array element is name.


We can call lookupList with the following parameters to extract a list of languages:

lookupList('people.languages.language', 'name', 'In {country}, these languages are spoken: {list}.{end}', intent, session, callback);

Here's the code to lookupList:
// --------------- lookupList - look up a list - e.g. What languages are spoken in {country}? | What ethnic groups live in France?

// path: a dotted path to array data, as in people.languages.language
// property: property of a list item to vocalize - ex: name

function lookupList(path, property, response, intent, session, callback) {

    const repromptText = 'Please ask me a country fact, such as "What is the population of France"?';
    const sessionAttributes = {};
    let shouldEndSession = false;
    let speechOutput = '';
    FlagImageUrl = null;
    MapImageUrl = null;
    let cardTitle = 'World Country Data';
    
    let country = normalizeCountryName(intent.slots.country.value);
    
    if (!inCountryList(country)) { // also sets FlagImageUrl & MapImageUrl if country name recognized
        console.log('lookupList: country not recognized: ' + country);
        cardTitle = 'World Country Data - Country Not Recognized';
        speechOutput = 'Sorry, I don\'t recognize that country name. ' + Tag;
        callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
    }
    else {
        cardTitle = 'World Country Data - ' + country;
        const AWS = require('aws-sdk');
        AWS.config.update({region: 'us-east-1'});
        const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 
    
        var params = {
          TableName: 'factbook',
          ExpressionAttributeNames: {
             '#name': 'name',
             '#source': 'source'
          },
          ExpressionAttributeValues: {
            ':name': country,
            ':source': 'Factbook'
          },
          KeyConditionExpression: '#name = :name and #source = :source'
        };
        
        docClient.query(params, function(err, data) {
    
            if (err) { 
                console.log('lookupList: Query Error - ' + err.toString());
                speechOutput = 'Sorry, an error occurred looking up the data.';
                shouldEndSession = true;
            } else { 

                if (!data || data.Items.length===0) {
                    speechOutput = 'Sorry, I found no data. ' + Tag;
                }
                else {
                    try {
                        var listText = '';
                        var listData =  eval('data.Items[0].' + path);
                        if (listData != '') {
                            for (var i = 0; i < listData.length; i++) {
                                if (i > 0) {
                                    listText += ", ";
                                }
                                if (property===null) {
                                    listText += listData[i];
                                }
                                else {
                                    listText += listData[i][property];
                                }
                            }
                        }
                        
                        response = replace(response, '{country}', country);
                        response = replace(response, '{list}', listText);
                        response = replace(response, '{end}', Tag);
                        response = replace(response, '{audio}', '<audio src="https://s3.amazonaws.com/factbookaudio/' + replace(country, ' ', '+') + '.mp3"/>');
                        speechOutput = response;
                    }
                    catch(e) {
                        console.log('lookupList: Exception - ' + e.toString());
                        speechOutput = "Sorry, I had a problem looking that up. ";
                        shouldEndSession = true;
                    }
                }
                    
            // Setting repromptText to null signifies that we do not want to reprompt the user.
            // If the user does not respond or says something that is not understood, the session
            // will end.
            callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
            }
        });
    }
}
lookupList function

lookupList is nearly identical to lookup. It also normalizes and validates the country name, and retrieves the JSON country record from DynamoDB. The big difference is lines 56-70 where the code iterates through the array at path and extracts variable property from each array element to form a list to speak. The response template can specify {list} as a placeholder for the list.

The spoken response to What languages are spoken in Spain? is In Spain, these languages are spoken: Castilian Spanish, Catalan, Galician, Basque, Aranese along with Catalan, speakers.

listTop


When a user asks a country ranking question like "which countries are largest?", the listTop function handles the request. listTop runs a query to get a Top 10 country list, such as the 10 countries with largest area. These are the same queries we used to get column charts in the web site in Part 2.

We can call lookupTop with the following parameters to hear the top countries in a category:

lookupTop('report-area-highest', 'The countries with the largest area are {country1}, {country2}, and {country3}.{end}', intent, session, callback);

Here's the code to listTop. The report name parameter is used to set the index, projection expression, and sort direction for the DynamoDB query. The response template may specify {country1}, {country2}, and {country3} for the names of the top 3 countries in the result.
// --------- lookupTop - look up top (leading) countries for a report - e.g. which country has the highest exports?: which countries are biggest?
// report: report name, such as 'report-exports-highest'
// response: speach to output, which may include embeddeed placeholders {country1}, {country2}, {country3}

function lookupTop(report, response, intent, session, callback) {

    const repromptText = 'Please ask me a country fact, such as "What is the population of France"?';
    const sessionAttributes = {};
    let shouldEndSession = false;
    let speechOutput = '';

    var index = null;
    var projectionExpression = null;
    var scanIndexForward = true;
    
    var FlagImageUrl = null;
    var MapImageUrl = null;
    const cardTitle = 'World Country Data';
    
    switch(report) {
        case 'report-area-highest':
            index = 'rank-area-index';
            projectionExpression = "#name, global_rank_area, global_value_area";
            scanIndexForward = true;
            break;
        case 'report-area-lowest':
            index = 'rank-area-index';
            projectionExpression = "#name, global_rank_area, global_value_area";
            scanIndexForward = false;
            break;
        case 'report-exports-highest':
            index = 'rank-exports-index';
            projectionExpression = "#name, global_rank_exports, global_value_exports";
            scanIndexForward = true;
            break;
        case 'report-exports-lowest':
            index = 'rank-exports-index';
            projectionExpression = "#name, global_rank_exports, global_value_exports";
            scanIndexForward = false;
            break;
        case 'report-imports-highest':
            index = 'rank-imports-index';
            projectionExpression = "#name, global_rank_imports, global_value_imports";
            scanIndexForward = true;
            break;
        case 'report-imports-lowest':
            index = 'rank-imports-index';
            projectionExpression = "#name, global_rank_imports, global_value_imports";
            scanIndexForward = false;
            break;
        case 'report-internet-users-highest':
            index = 'rank-internet-users-index';
            projectionExpression = "#name, global_rank_internet_users, global_value_internet_users";
            scanIndexForward = true;
            break;
        case 'report-population-highest':
            index = 'rank-population-index';
            projectionExpression = "#name, global_rank_population, global_value_population";
            scanIndexForward = true;
            break;
        case 'report-population-lowest':
            index = 'rank-population-index';
            projectionExpression = "#name, global_rank_population, global_value_population";
            scanIndexForward = false;
            break;
        default:
            report = null;
            break;
    }
    
    if (report===null) {
        speechOutput = "Sorry, I could not find that data. ";
        shouldEndSession = true;
        callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
    }
    else {
        const AWS = require('aws-sdk');
        AWS.config.update({region: 'us-east-1'});
        const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 
    
        var params = {
          TableName: 'factbook',
          IndexName: index,
          ExpressionAttributeNames: {
             '#name': 'name',
             '#source': 'source'
          },
          ExpressionAttributeValues: {
            ':source': 'Factbook',
          },
          KeyConditionExpression: '#source = :source',
          ProjectionType : "ALL",
          ProjectionExpression: projectionExpression,
          Limit: 10,
          ScanIndexForward: scanIndexForward
        };
        
        docClient.query(params, function(err, data) {
    
            if(err) { 
                console.log('lookupTop: Error - ' + err.toString());
                 speechOutput = 'Sorry, an error occurred looking that up.';
                 shouldEndSession = true;
            } else { 
                if (!data || data.Items.length===0) {
                    speechOutput = 'Sorry, I found no data. ' + Tag;
                }
                else {
                    try {
                        response = replace(response, '{country1}', data.Items[0].name);
                        response = replace(response, '{country2}', data.Items[1].name);
                        response = replace(response, '{country3}', data.Items[2].name);
                        response = replace(response, '{end}', Tag);
                        speechOutput = response;
                    }
                    catch(e) {
                        console.log('lookupTop: Exception - ' + e.toString());
                        speechOutput = "Sorry, I had a problem looking that up.";
                    }
                }
                    
            // Setting repromptText to null signifies that we do not want to reprompt the user.
            // If the user does not respond or says something that is not understood, the session
            // will end.
            callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
            }
      });
    }
}
listTop Function

The response to Which countries are largest? is The countries with the largest area are Russia, Antarctica, and Canada.


buildSpeechletResponse


We've made several references to a buildSpeechletResponse function, which assembles the response to send back to Alexa. Here's the code to buildSpeechletResponse:

// --------------- Helpers that build all of the responses -----------------------

function buildSpeechletResponse(title, output, repromptText, shouldEndSession) {
    
    var outputSpeech = null;
    
    var cardText = replace(output, '<break time="1s"/>', '\r\n\r\n');
    var pos = cardText.indexOf('<audio');
    if (pos != -1) cardText = 'Playing National Anthem\r\n\r\nWhat else would you like to know?';
    
    if (output.indexOf('<') != -1) {
        outputSpeech = {   // output contains markup (audio, breaks) - output SSML
            type: 'SSML',
            ssml: '<speak>' + output + '</speak>',
        };   
    }
    else {
        outputSpeech = {  // output is just text
            type: 'PlainText',
            text: output
        };
    }

    return {
        outputSpeech: outputSpeech,
        card: {
            type: 'Standard',
            title: `${title}`,
            text: cardText,
            content: `SessionSpeechlet - ${output}`,
            image: {
                "smallImageUrl": FlagImageUrl,
                "largeImageUrl": MapImageUrl
            }
        },
        reprompt: {
            outputSpeech: {
                type: 'PlainText',
                text: repromptText,
            },
        },
        shouldEndSession,
    };
}
buildSpeechletResponse Function

The above code assembles a repsonse that includes an outputSpeech object. In the original sample used as a starting point for this project, outSpeech was just text with a type of 'PlainText'. However, our responses sometimes include <break time="1s"> markup to pause a second; and <audio ...=""> tags to play national anthems. For these reason our response is of type SSML

Certification

Having put the work into creating this skill, I decided to submit it for certification so anyone could use it. This was my first time going through the certification process, and I was pleased to find it a smooth process.

The first step was to polish the voice app as much as I could. Alexa voice apps are easy to get started, but getting them to a good production-ready state is another thing; it requires thinking through all the different ways someone might express an intent and a lot of testing. In particular, it requires getting input from multiple people since you won't think of everything yourself. After convincing my family to assist me with testing, I felt I was ready for my first submission.

The Amazon developer console takes you through a Distribution area for describing your app and answering some questions about it; you can then run a validation and automated test to pick up some low-hanging fruit about areas that need attention. When you're past all that you can submit for certification, then sit back and await feedback email. I submitted my first attempt on a Sunday evening and when I went to my computer Monday morning feedback was waiting for me. There were just two very reasonable issues to address, explained in a helpful way.

One issue had to do with my responses. I'd designed the app to stay open until you expressly tell it to exit; the reasoning being that if you're getting country facts, you probably want to ask a series of questions. The feedback said I could only do that if my responses prompted the user for something more. That was easy to address: now when a question is answered, there's a one-second pause followed by "What else would you like to know?"

The second issue had to do with cards, which is what Alexa displays when used from a device with a display (such as my TV with Amazon FireStick). The default sample app I used as a starting point output very technical titles like 'SpeeachApplet'. I overhaued the card output, and now a card is display that includes a friendly-worded title; a map of the country being described (or its flag on small-size display); and the text of the response. Here for example is the card displayed in response to Where is French Polynesia?

Card Response to "Where is French Polynesia?"

I resubmitted Monday mid-morning, and pass certification overnight. All in all, a satisfying certification process. Here's what the skill listing looks like on Amazon.com:


In Conclusion

In this series, I showed how public-domain data from the CIA World Factbook can be hosted on Amazon Web Services. After bringing that data into DynamoDB and creating a Lambda Function serverless API, and then creating a web site, we tackled an Alexa Skill in this final part of the series. You can access the skill with "Alexa, Open World Country Data".

Our skill was able to answer questions about specific values for a country, such as its population or area; as well as lists such as the languages spoken in a country. Country ranking inquiries can also be made, such as which countries have the lowest exports or highest inflation. Although we covered many data points, there's a great deal more that can be mined out of this data source.


Friday, February 22, 2019

CIA World Factbook Data on AWS, Part 2: Front-end API & Web Site using Lambda Functions and DynamoDB

In this 3-part series, I'm going to show how you can take CIA World Factbook data and use it for your own purposes on Amazon Web Services.


Previously in Part 1 we noted the CIA World Factbook data is public domain. We created the back-end to collect data and store it in DynamoDB and S3 storage, using a Lambda Function to insert document records. We also created some rudimentary Lambda Functions for accessing country records.

Today in Part 2 we will create the front-end, which will include a fuller API and a web site—powered by Lambda Functions and DynamoDB. With that, we'll finally be able to access and use all that data we collected. You can access the web site at http://world-factbook.aws.davidpallmann.com.

  

What We're Building

Today we'll be creating three things to form our front-end:
  1. API / Lambda Functions. We're going to use an API of Lambda Functions to query the DynamoDB. We'll need functions to look up country data, perform searches, and retrieve chart data.
  2. DynamoDB Secondary Indexes. We'll need to add additional indices to our DynamoDB database in order to support chart data retrieval.
  3. Web Site. We'll create a web site that allows browsing, searching, and viewing charts of world country data via the API. This site will work on both desktop and mobile devices.
Our web site can do 3 things, and we'll address each one in a section of this post:
  • Country View: select a country to view its record details (geography, people, economy, etc.)
  • Search: enter a search term and get a list of matching countries.
  • Charts: select a chart and view a column chart.

Web Site Foundation

In a prior series, I hosted this data on Microsoft Azure and created a statically-hosted web site. We're going to host the same web site for the AWS edition, but I've refactored the web site code so that it more easily supports either cloud platform with common source code. 

I've also decided to change the background map image (also public domain) and color theme for the AWS site. The original Azure site had an Amber-Sienna color theme; the AWS site will have a blue color schema. Here's how the two web sites compare:

Azure Edition of Web Site

AWS Edition of Web Site

Here's how I've structured the site, as a small number of files that can be hosted in inexpensive cloud storage. The majority of the code, markup, and style rules are common; only four small files are deployment-specific: favicon.ico, logo.png, theme.css, and cloud.js. 

File  Description  Common or Unique
cloud.js  Platform-specific functions and properties  Unique
favicon.ico  Browser icon  Unique
index.html  Web page  Common
logo.png  Logo  Unique
site.css  Style rules  Common
site.js  JavaScript  Common
theme.css  Color theme  Unique
world-map.jpg  Background image  Unique

Since I've previously blogged about the site, I'm not going to do a detailed walk-through of the code. However, I will highlight some critical parts of it. The full code can be accessed on github (see link at end of post).

When you access the web site, you may experience a short delay initially; that's because Lambda Functions power the site, and if inactive (a cold start) you'll experience a few seconds wait while the function is deployed. There are ways to keep the functions warm, but as this is just a demonstration I opted for a lowest cost deployment.

Country View

The user can select a country from the drop-down list of 260 countries at top right.


When a country is selected, the Lamba Function country is called which returns a complete JSON country record. We wrote this function  and studied the country JSON in Part 1; here's an updated view of the code.
exports.handler = function(event, context, callback) {

    const AWS = require('aws-sdk');
    AWS.config.update({region: 'us-east-1'});
    const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 
    
    var corsHeaders = {
                            "Access-Control-Allow-Origin" : "*",
                            "Access-Control-Allow-Credentials" : true
                    };

    var countryName = null;
    
    if (event && event.queryStringParameters && event.queryStringParameters.name) countryName = event.queryStringParameters.name;
    
    if (!countryName) {
        callback(null, { statusCode: 200, headers: corsHeaders, body: 'Missing parameter: name' });
    }

    var params = {
      TableName: 'factbook',
      ExpressionAttributeNames: {
         '#name': 'name',
         '#source': 'source'
      },
      ExpressionAttributeValues: {
        ':name': countryName,
        ':source': 'Factbook'
      },
      KeyConditionExpression: '#name = :name and #source = :source',
    };
    
    docClient.query(params, function(err, data) {

    if(err) { 
        console.log('03 err:')
        console.log(err.toString());
        callback(err, { statusCode: 500, headers: corsHeaders, body: 'Error: ${err}' });
    } else { 
        if (!data || data.Items.length===0) {
            callback(null, { statusCode: 400, headers: corsHeaders, body: 'Country not found: ' + countryName });
        }
        else {
            callback(null, {
                    headers: corsHeaders,
                    body: JSON.stringify(data.Items[0])
                });
        }
    }
  });
};
country Lambda Function

To be able to call this function from our web site without Cross-Original Resource Sharing (CORS) errors, we also had to go to API Gateway configuration for our country-API and Enable CORS.

Enabling CORS in API Gateway

After retrieving the country JSON from the country function, the site then populates the accordion content sections (Introduction, Geography, People, Government, Economy, Energy, Communications, Transportation, Military and Security,and Transnational Issues). Although we're getting and displaying many fields, it's only a fraction of what's in the data; over time, we'll try to expand it.

Although the country JSON document structure is consistent, any particular element we want to access may or may not be present. Accordingly, our JavaScript code to create content sections has to carefully check whether elements exist. Below is a code fragement showing how the Person section of content is assembled. We did not use a framework like Angular or React to do this (although we may at some point); even so we would have had to use the same logic and checks in our HTML template.
// Load content: People

var people = '';
if (data.people) {
    if (data.people.population && data.people.population.total) {
        people += '<div class="item"><b>Population</b><br/>' + numberWithCommas(data.people.population.total) + '</div>';
    }
    if (data.people.population && data.people.population.rank) {
        people += '<div class="item"><b>Global Rank</b><br/>' + data.people.population.global_rank + '</div>';
    }
    if (data.people.nationality && data.people.nationality.adjective) {
        people += '<div class="item"><b>Nationality</b><br/>' + data.people.nationality.adjective + '</div>';
    }
    if (data.people.ethnic_groups && data.people.ethnic_groups.ethnicity) {
        var ethnic_groups = data.people.ethnic_groups;
        people += '<div class="item"><b>Ethnic Groups</b><br/>'
        for (var i = 0; i < ethnic_groups.ethnicity.length; i++) {
            var pct = ethnic_groups.ethnicity[i].percent;
            var elem = ethnic_groups.ethnicity[i].name;
            var note = ethnic_groups.ethnicity[i].note;
            if (pct)
                elem += " (" + pct + '%)';
            if (note)
                elem += " note: " + note;
            if (i == 0)
                people += elem;
            else
                people += ", " + elem;
        }
        people += '</div>';
    }
    if (data.people.languages && data.people.languages.language) {
        var languages = data.people.languages;
        people += '<div class="item"><b>Languages</b><br/>'
        for (var i = 0; i < languages.language.length; i++) {
            var pct = languages.language[i].percent;
            var elem = languages.language[i].name;
            if (pct)
                elem += " (" + pct + '%)';
            if (i == 0)
                people += elem;
            else
                people += ", " + elem;
        }
        people += '</div>';
    }
    if (data.people.religions && data.people.religions.religion) {
        var religions = data.people.religions;
        people += '<div class="item"><b>Religions</b><br/>'
        for (var i = 0; i < religions.religion.length; i++) {
            var pct = religions.religion[i].percent;
            var elem = religions.religion[i].name;
            if (pct)
                elem += " (" + pct + '%)';
            if (i == 0)
                people += elem;
            else
                people += ", " + elem;
        }
        people += '</div>';
    }
    if (data.people.life_expectancy_at_birth && data.people.life_expectancy_at_birth.total_population && data.people.life_expectancy_at_birth.total_population.value && data.people.life_expectancy_at_birth.total_population.units) {
        people += '<div class="item"><b>Life Expectancy at Birth</b><br/>' + data.people.life_expectancy_at_birth.total_population.value + ' ' + data.people.life_expectancy_at_birth.total_population.units + '</div>';
    }
    if (data.people.population_growth_rate && data.people.population_growth_rate.growth_rate && data.people.population_growth_rate.date) {
        people += '<div class="item"><b>Population Growth Rate</b><br/>' + data.people.population_growth_rate.growth_rate + ' (' + data.people.population_growth_rate.date + ')</div>';
    }
    if (data.people.birth_rate && data.people.birth_rate.births_per_1000_population && data.people.birth_rate.date) {
        people += '<div class="item"><b>Birth Rate</b><br/>' + data.people.birth_rate.births_per_1000_population + ' births per thousand (' + data.people.birth_rate.date + ')</div>';
    }
    if (data.people.death_rate && data.people.death_rate.deaths_per_1000_population && data.people.death_rate.date) {
        people += '<div class="item"><b>Death Rate</b><br/>' + data.people.death_rate.deaths_per_1000_population + ' deaths per thousand (' + data.people.death_rate.date + ')</div>';
    }
    if (data.people.demographic_profile) {
        people += '<div class="item"><b>Demographic Profile</b><br/>' + data.people.demographic_profile + '</div>';
    }
}
$('#content-people').html(people);
JavaScript to Extract People Content

If no accordion sections were already open when the content loads, the Introduction section expands. The user can then review the data. In addition to the JSON country record, there are image files for the country flag and a map; these are retrieved from S3 storage.


Country View in Web Site

Search

Searching the data is tough: we don't have a full-text search capability at present. Fortunately, a text search would likely target either the country name or one of the large text briefs in the data, such as introduction.background or government.overview. Accordingly, our first implementation of search will use a Lambda Function that looks for a match in select fields of the country record.

Below is our search Lambda Function, which accepts a term and returns an array of matching country names and keys.
exports.handler = function(event, context, callback) {

    const AWS = require('aws-sdk');
    AWS.config.update({region: 'us-east-1'});
    const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 
    
    var corsHeaders = {
                        "Access-Control-Allow-Origin" : "*", // Required for CORS support to work
                        "Access-Control-Allow-Credentials" : true // Required for cookies, authorization headers with HTTPS 
                    };

    var testing = false;

    var term = null;

    if (testing) {
        term = 'island';
    }
    else {
        try {
            term = event.queryStringParameters.term;
        }
        catch(e) { }
    }
    
    if (term===null || term===undefined || term==='') {
        callback(null, { statusCode: 400, headers: corsHeaders, body: 'Missing parameter: term' });
        return;
    }

    var params = {
      TableName: 'factbook',
      ExpressionAttributeNames: {
         '#key': 'key',
         '#name': 'name',
         '#source': 'source'
      },
      ExpressionAttributeValues: {
        ':term': term,
        ':source': 'Factbook'
      },
      KeyConditionExpression: '#source = :source',
      FilterExpression: 'contains(#key, :term) or contains(introduction.background, :term) or contains(geography.climate, :term) or contains(geography.terrain, :term) or contains(people.demographic_profile, :term) or contains(economy.overview, :term) or contains(geography.map_reference, :term) or contains(government.government_type, :term) or contains(transnational_issues.disputes[0], :term)',
      ProjectionExpression: '#name, #key'
    };

    docClient.query(params, function(err, data) {

    if(err) { 
        console.log('03 err:')
        console.log(err.toString());
        callback(err, { statusCode: 500, headers: corsHeaders, body: 'Error: ${err}' });
    } else { 
        callback(null, {
                    headers: corsHeaders,
                    body: JSON.stringify(data.Items)
                });
    }
  });
};
search Lambda Function

Our function will be HTTP-triggered via AWS API Gateway and we'll write it in Node.js. The function code initializes a DynamoDB document client (lines 3-5), extracts the expected term query string parameter (lines 14-29), sets up query parameters (lines 31-45), and executes the query (lines 47-59).

The query parameter of interest is FilterExpression, which is a long series of contains(field, :term) sequences connected with OR operators. The DynamoDB contains operator is case-sensitive, putting a burden on the user. This "poor man's search" will be adequate for the time being, but we'll want to come back and improve on this at a later time. Ideally, a user should be able to search the entire country record with a full-text, case-insensitive search.

Here's an example of what search returns when given the term "island":
[
{
key: "american_samoa",
name: "American Samoa"
},
{
key: "anguilla",
name: "Anguilla"
},
{
key: "antarctica",
name: "Antarctica"
},
{
key: "antigua_and_barbuda",
name: "Antigua And Barbuda"
},
{
key: "aruba",
name: "Aruba"
},
{
key: "ashmore_and_cartier_islands",
name: "Ashmore And Cartier Islands"
},
{
key: "bahamas_the",
name: "Bahamas, The"
},
{
key: "barbados",
name: "Barbados"
},
{
key: "bermuda",
name: "Bermuda"
},
{
key: "bouvet_island",
name: "Bouvet Island"
},
{
key: "british_indian_ocean_territory",
name: "British Indian Ocean Territory"
},
{
key: "british_virgin_islands",
name: "British Virgin Islands"
}
]
search Function Results

Our web site's JavaScript code for search is below. An Ajax call is made to the Lambda Function, and the results are iterated to come up with a search results list of country names and flags.
// Perform a search.

function search() {

    var term = $('#search-text').val();
    if (!term) return;

    $("body").css("cursor", "progress");
    $('#loading').css('visibility', 'visible');

    $('#country').val('');
    $('#chart-select').val('');

    $('h2').removeClass('optional');

    $('#country-view').css('visibility', 'collapse');
    inCountryView = false;

    var url = cloud.searchUrl(term);

    $.ajax({
        type: 'GET',
        url: url,
        accepts: "json",
    }).done(function (response) {

        var results = cloud.resultToJson(response);

        var html = '<table id="results-table" style="color: white; font-size: 20px">';
        var count = 0;
        var countryKey = null;
        if (results) {
            for (var i = 0; i < results.length; i++) {
                countryKey = CountryKey(results[i].name);
                html += '<tr style="cursor: pointer; height: 24px; border-bottom: solid 1px white" onclick="selectCountry(' + "'" + results[i].name + "'" + ');">';
                if (haveFlag(results[i].name)) {
                    var flagImageUrl = cloud.flagImageUrl(countryKey);
                    html += '<td style="text-align: right"><img class="content-image-thumbnail" src="' + flagImageUrl + '"></td>';
                }
                else {
                    html += '<td> </td>';
                }
                html += '<td>  </td><td style="vertical-align: middle">' + results[i].name + '</td></tr>';
                count++;
            }
        }
        if (count == 0) {
            html += '<tr><td>No matches</td></tr>';
        }
        html += '</table>';

        $('#results-list').html(html);
        $('#country-flag').css('visibility', 'collapse');
        $('#chart-view').css('visibility', 'collapse');
        $('#results-view').css('visibility', 'visible');

        $('#loading').css('visibility', 'collapse');
        $("body").css("cursor", "default");
    });
}
JavaScript search code

Putting it all together, a search on the web site looks like this. With the search results displayed, the user may click on any country in the list; if they do, a Country View takes place just as if they had selected the country from the top right drop-down.

Search in Web Site

Charts

Lastly, our web site offers chart views. The user can select a chart from the list and see a chart showing comparative country data. The site currently provides these charts:
  • Area - Largest
  • Area - Smallest
  • Exports - Highest
  • Exports - Lowest
  • Imports - Highest
  • Imports - Lowest
  • Inflation - Highest
  • Inflation - Lowest
  • Internet Users - Most
  • Population - Highest
  • Population - Lowest
Each chart provides a list of 10 countries rendered as a column chart using Google Charts.

Although the number of documents we have in DynamoDB is small (260), we nevertheless want to follow good practices that would also work well with data at large scale. In the country JSON, there are properties deep in the document that list a country's global rank for area, exports, Internet users, etc. All we need to do, then, is sort by the particular global rank we're interested in and take the top 10 results.

Adding a Secondary Index to DynamoDB

In DynamoDB, you can't specify an order in your query other than to use the sort order of an index (ascending or descending). So, if we wanted to list countries by order of area global rank, we'd want to order by the area global rank and plot the area in square km in our chart. Here's where they are in the country JSON:


What we need to do, conceptually, is create a Secondary Index with a sort key of geography.area.global_rank that also includes the country name (name) and actual area (geography.area.total.value). Unfortunately, you can only include top-level properties in a Secondary Index so we can't do it exactly that way...

What we can do is promote these properties—and their brethren for the other charts—to be top-most properties. We do that by modifying the load-country Lambda Function we created in Part 1 to surface these new top-level properties. Here's the code we inserted:
if (data != null) 
{
// add 3 fields to the document

data.key = key; // countryKey(data.name);
data.timestamp = 'Monday, February 11, 2019 4:09:28 PM';
data.source = 'Factbook';

// promote fields to top that we need to index on

if (data.geography && data.geography.area && data.geography.area.global_rank) {
    data.global_rank_area = data.geography.area.global_rank; }
if (data.people && data.people.population &&  data.people.population.global_rank) {
    data.global_rank_population = data.people.population.global_rank; }
if (data.economy && data.economy.imports && data.economy.imports.total_value && data.economy.imports.total_value.global_rank)
    data.global_rank_imports = data.economy.imports.total_value.global_rank;
if (data.economy && data.economy.exports && data.economy.exports.total_value && data.economy.exports.total_value.global_rank)
    data.global_rank_exports = data.economy.exports.total_value.global_rank;
if (data.economy && data.economy.inflation_rate && data.economy.inflation_rate.global_rank)
    data.global_rank_inflation_rate = data.economy.inflation_rate.global_rank;
if (data.communications && data.communications.internet && data.communications.internet.users && data.communications.internet.users.global_rank)
    data.global_rank_internet_users = data.communications.internet.users.global_rank;
if (data.geography && data.geography.area && data.geography.area.total && data.geography.area.total.value)
    data.global_value_area = data.geography.area.total.value;
if (data.people && data.people.population &&  data.people.population.total)
    data.global_value_population = data.people.population.total;
if (data.economy && data.economy.imports && data.economy.imports.total_value && data.economy.imports.total_value.annual_values && data.economy.imports.total_value.annual_values[0] && data.economy.imports.total_value.annual_values[0].value)
    data.global_value_imports = data.economy.imports.total_value.annual_values[0].value;
if (data.economy && data.economy.exports && data.economy.exports.total_value && data.economy.exports.total_value.annual_values && data.economy.exports.total_value.annual_values[0] && data.economy.exports.total_value.annual_values[0].value)
    data.global_value_exports = data.economy.exports.total_value.annual_values[0].value;
if (data.economy && data.economy.inflation_rate && data.economy.inflation_rate.annual_values && data.economy.inflation_rate.annual_values[0] && data.economy.inflation_rate.annual_values[0].value)
    data.global_value_inflation_rate = data.economy.inflation_rate.annual_values[0].value;
if (data.communications && data.communications.internet && data.communications.internet.users && data.communications.internet.users.total)
    data.global_value_internet_users = data.communications.internet.users.total

// insert country record

var params = {
    TableName: 'factbook',
    Item: data
    };

console.log("Adding new item...");
docClient.put(params, function(err, data2) {
Code Added to load-country Function

With this update, and after re-loading all the country records into DocumentDB, we now have the top-level properties we need:

Updated Country JSON with Top-level Global Rank/Value Properties

Now we are able to create secondary indices on our DynamoDB database. Here's how we create the index rank-area-index. Let's take note of a few things. The partition key is source (which is always "Factbook"), same as our primary index. The sort key is global_rank_area, a number. This will make it easy to get the top N or bottom N countries by global area rank. The attributes for the index include name, global_area_rank, and global_area_value; we need the value in order to plot anything meaningful in our chart.

Creating Secondary Index

That was a bit of work; but with our index created, we can now create a Lambda Function to query by area rank. Here's report-area-highest, which returns the names, rank, and area for the top 10 countries with largest area. Notice that the query parameters specify an IndexName of rank-area-index and a ScanIndexForward (sort order) value of true. We also specify a Limit of 10, which will give us just 10 records back. For the sister report-area-lowest function, ScanIndexForward will be set to false.
// List top 10 countries with largest area

const AWS = require('aws-sdk');
AWS.config.update({region: 'us-east-1'});
var docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 

var corsHeaders = { "Access-Control-Allow-Origin" : "*", "Access-Control-Allow-Credentials" : true };

exports.handler = function(event, context, callback) {

    var params = {
      TableName: 'factbook',
      IndexName: 'rank-area-index',
      ExpressionAttributeNames: {
         '#name': 'name',
         '#source': 'source'
      },
      ExpressionAttributeValues: {
        ':source': 'Factbook',
      },
      KeyConditionExpression: '#source = :source',
      ProjectionType : "ALL",
      ProjectionExpression: "#name, global_rank_area, global_value_area",
      Limit: 10,
      ScanIndexForward: true
    };
    
    docClient.query(params, function(err, data) {

        if(err) { 
            console.log('03 err:')
            console.log(err.toString());
            callback(err, { statusCode: 500, headers: corsHeaders, body: 'Error: ${err}' });
        } else { 
            callback(null, {
                    headers: corsHeaders,
                    body: JSON.stringify(data.Items)
                });
        }
      });
};
report-area-highest Lambda Function

Here's the output when report-area-highest is run. It's just what we want: the top 10 countries with highest area, including the name and value for each country.
[
{
    global_rank_area: 1,
    name: "Russia",
    global_value_area: 17098242
},
{
    global_rank_area: 2,
    name: "Antarctica",
    global_value_area: 14000000
},
{
    global_rank_area: 3,
    name: "Canada",
    global_value_area: 9984670
},
{
    global_rank_area: 4,
    name: "United States",
    global_value_area: 9833517
},
{
    global_rank_area: 5,
    name: "China",
    global_value_area: 9596960
},
{
    global_rank_area: 6,
    name: "Brazil",
    global_value_area: 8515770
},
{
    global_rank_area: 7,
    name: "Australia",
    global_value_area: 7741220
},
{
    global_rank_area: 8,
    name: "India",
    global_value_area: 3287263
},
{
    global_rank_area: 9,
    name: "Argentina",
    global_value_area: 2780400
},
{
    global_rank_area: 10,
    name: "Kazakhstan",
    global_value_area: 2724900
}
]
report-area-highest output

When the JavaScript code in the web site plots this with Google Charts, here's what the end result is:

Chart in Web Site

Each of the other charts was implemented exactly the same way: promote the appropriate properties to the top of the JSON, add a Secondary Index to DynamoDB,and write a simple Lambda Function to query using the index. The first one was a bit of work; but once the correct pattern was identified, all the others followed in rapid succession.

In Conclusion

In this series I showed how to retrieve world country data from CIA World Factbook and store it in AWS, along with an API and web site for accessing the data. DynamoDB was our primary repository along with S3 storage, and its performed well. Lambda Functions were integral to both the back end (loading country records) and front-end (country view, search, charts) and were written in Node.js (JavaScript).

The resulting web site can be accessed at http://world-factbook.aws.davidpallmann.com. To create the web site, a prior web site from another project cloud platform was refactored so that it would work with AWS or Azure with mostly common code.

To do charting, we needed to create secondary indices which in turn required us to promote some of our JSON values to top-level properties. Once that was done, it was a breeze to create the necessary Lambda Functions. Combined with Google Charts, we quickly had charts up and running.

What we've covered in Parts 1 and 2 took two days of development and two days of blog-writing.
Cloud-native services like Lambda and DynamoDB make for rapid development.

In Part 3, we'll be creating an Alexa Skill so that our data can be accessed by voice.

Web Site Source Code on GitHub

Next: CIA World Factbook on AWS, Part 3: Alexa Voice Interface using Lambda and DynamoDB