Tuesday, February 26, 2019

CIA World Factbook on AWS, Part 3: Alexa Voice Interface using Lambda and DynamoDB

In this 3-part series, I'm showing how you can take CIA World Factbook data and use it for your own purposes on Amazon Web Services. Previously in Part 1 we created the back-end to collect data and store it in DynamoDB and S3 storage, using a Lambda Function to insert document records. In Part 2 we created a Lambda API for data access and a web site for browsing, searching, and viewing charts. Today in Part 3 we're creating an Alexa Skill so that the world country data can be accessed by voice.


What We're Building

Unless you've been living under a rock, you know that Amazon's cloud-based voice service is named Alexa and can be accessed from... well, anywhere: devices like the Amazon Echo and Dot; on your TV via Amazon FireStick; in a variety of cars; and many other places. Even if you don't own an Alexa-enabled device, you can get to right now on the web at https://alexa.amazon.com; or from your phone using the Alexa app.

You can access the country data skill with "Alexa, Open World Country Data".

We are going to create an Alexa Skill (voice app) named World Country Data, backed by a Lambda Function that responds to spoken inquiries. The Lambda Function will query the DynamoDB database to retrieve country data.

World County Data Dialog

The Alexa skill will be able to respond to inquiries like these:

  • What is the population of country?
  • Where is country?
  • How large is country?
  • What language is spoken in country?
  • What are the major cities in country?
  • What countries border country?
  • What are the natural resources of country?
  • What are the agricultural products of country?
  • What are the industries of country?
  • Give me an overview of country
  • Brief me on the economy of country
  • How many mobile phones are in country?
In addition, a user can also inquire about how countries compare and rank:
  • Wha country is the largest?
  • Which countries have the highest exports?
  • What countries have the most Internet users?
  • What country has the lowest population?
We also want the ability to play an audio clip:
  • Play the national anthem of country
Here's what happens architecturally: spoken inquires from users to Alexa undergo speech recognition, machine learning, and natural language processing. Alexa recognizes an intent and passes it to our Lambda Function. Factbook skill looks up the required data from the DynamoDB which contains JSON records for each country. The function returns a speech response, which in some cases may reference an audio clip. For Alexa devices with a display, the response also includes a card, which displays has a map or flag image (depending on device size). Audio clips and images are stored in S3.

Achitecture of World Country Data Skill

Our project will consist of a Alexa Skill (configuration) and a Lambda Function to go with it (code). We'll start with the Alexa skill. Our starting point for the project was the Color Picker sample in Node.js, which provides basic skeleton code for a skill.

Alexa Skill

To avoid potential confusion, I should mention that as I started work on this leg of the project, it came to my attention that there is already an existing World Factbook skill (not from me). Don't confuse that with my skill which is named World Country Data. The two skills are pretty different in scope, however.

Alexa Skills are created in the Amazon developer portal, developer.amazon.com, not the AWS console. You'll need to register as a developer. Our skill is named World Country Data.

The first area to set up in the skill project are the intents. An intent is something a user is trying to communicate, such as What's the population of {country} and contains a list of utterances. Intents are easy and not easy at the same time: it's simple enough to enter some utterances for the intent and try it out on Alexa. What's not so easy is thinking through all the possible ways someone might phrase an inquiry.

Scalar Value Inquiries

One category of intents we'll have is inquires about scalar values, such as What's the population of {country}?, that return a single value (in this case, a number). Here's our definition for the population intent:


The embedded value {country} is called a slot, which is a type of placeholder. We want to be able to ask these questions about any country. Further down in the intent, we see that the slot country is of type country, meaning a custom list of country names we will create.


There have been a number of places in this project where it's been necessary to list 260 individual countries. Defining the possible slot values for country is one of them. My fingers are getting a good work out! Technically speaking, the intents would work without listing every possible country here; however, there's a huge improvement in Alexa's understanding and selecting the right intent when the expected slot values are defined.

Defining country Slot Values

The response to this intent will be provided by our Lanbda Function, which we'll get to later in this post. We create similar intents for area, climate, terrain, literacy rate, and the number of phones. This hardly scratches the surface of the available data; we'll come back and do more someday.

Text Narratives

Some fields in the country JSON contain paragraph text. For example, there's introduction.background, a background preamble on the country; and economy.overview, a brief on a country's economy. Technically, retrieving these items is no different than the scalar values described in the prior section; the experience to the user is quite different, however, as Alexa will read on and on. We handle these kind of inquires in the same way, with intents.

Alexa Dialog: Country Overview

Lists

Some of the World Factbook data is in the form of lists (JSON object arrays), such as a country's agricultural products or major urban areas.

Alexa Dialog: Major Urban Areas

Our intents for lists are not much different than our intents for scalar values: the only slot needed is the country name. However, our Lambda Function will need different code to retrieve list data.

Intent major_cities: Utterances

Top Countries

In addition to asking facts about a particular country, a user might want to know which countries are ranked highest or lowest in various categories like area, population, exports, or inflation. In Part 2 we created Lamba functions and database queries to get this information and render column charts. Today we can use the same queries (such as Top 10 exports) and give a response like this: The countries with the largest area are Russia, Antarctica, and Canada.

Alexa Dialog: Leading Countries

As before, our intents need to consider a variety of utterances that convey the same meaning:

Intent area_highest: Utterances

We create intents for largest/smallest area, highest/lowest population, highest/lowest inflation rate, and most Internet users. We'll look at the backing Lambda functions later in this post. Once again, there's so much additional data we could mine. 

Playing National Anthems

The CIA World Factbook data includes audio clips for most countries' national anthems. Being able to play this audio in the skill is one of my favorite features.

To support this in our skill took some work, because Alexa imposes some limitations on audio clips:
  • Audio clips must be hosted at an HTTPS endpoint.
  • The MP3 must be an MPEG version 2 file.
  • The audio file cannot be longer than 240 seconds.
  • The bit rate must be 48 kbps.
  • The sample rate must be 22050Hz, 24000Hz, or 16000Hz.
These are not great specs for music audio, but we have no choice if we are going to be able to play our clips in our Alexa skill. 

After obtaining the audio clips for each country, it was necessary to transform them to meet the above specifications. Following the AWS guidance on audio conversion, the following steps were performed for each national anthem MP3 file, using a free audio tool named Audacity.
  1. Open the nation anthem MP3.
  2. Change the Audactity projects' sampling rate to 16000.
  3. Export the audio to MP3, setting a bit rate of 48kbps.
Converting Audio files to 48 kbps with Audacity

After converting all the country national anthem files, they were uploaded to S3 which is where the Alexa Skill will play them from. We'll see how when we look at the backing Lambda Function.

Here's a sample national anthem clip for Austria.

Lambda Function

Our Alexa Skill depends on a Lambda Function writtein in Node.js. Starting with the Color Picker sample gives us a basic voice app, which we will now turn into Country World Data. Although this is just one Lambda function, it calls into several sub-functions we'll need to review.

onIntent

The onIntent function responds to an intent by calling an appropriate handler function based on the intent name:

  • Scalar value requests and text briefing requests are handled by the lookup function.
  • List value requests are handled by the lookupList function.
  • Top countries in a category requests are handled by the lookupTop function.
/**
 * Called when the user specifies an intent for this skill.
 */
function onIntent(intentRequest, session, callback) {
    console.log(`onIntent requestId=${intentRequest.requestId}, sessionId=${session.sessionId}`);

    const intent = intentRequest.intent;
    const intentName = intentRequest.intent.name;

    if (intentName === 'about')                 { lookup('introduction.background', '{value}{end}', intent, session, callback); } 
    else if (intentName === 'area')             { lookup('geography.area.total.value', '{country} is {value} square kilometers in size.{end}', intent, session, callback); }  
    else if (intentName === 'climate')          { lookup('geography.climate', '{value}{end}', intent, session, callback); } 
    else if (intentName === 'count_mobile_phones') { lookup('communications.telephones.mobile_cellular.total_subscriptions', 'There are {value} mobile phones in {country}.{end}', intent, session, callback); } 
    else if (intentName === 'count_land_phones') { lookup('communications.telephones.fixed_lines.total_subscriptions', 'There are {value} land lines in {country}.{end}', intent, session, callback); } 
    else if (intentName === 'economy')          { lookup('economy.overview', '{value}{end}', intent, session, callback); } 
    else if (intentName === 'terrain')          { lookup('geography.terrain', '{value}{end}', intent, session, callback); } 
    else if (intentName === 'where')            { lookup('geography.location', '{value}{end}', intent, session, callback); }  
    else if (intentName === 'population')       { lookup('people.population.total', 'The population of {country} is {value}{end}.', intent, session, callback); } 
    else if (intentName === 'urban_population') { lookup('people.urbanization.urban_population.value', 'The urban population of {country} is {value} percent.{end}', intent, session, callback); } 
    else if (intentName === 'literacy_rate')    { lookup('people.literacy.total_population.value', 'The literacy rate of {country} is {value} percent.{end}', intent, session, callback); } 

    else if (intentName === 'play_national_anthem') { lookup(null, '{audio}{end}', intent, session, callback); } 
    
    else if (intentName === 'agricultual_products') { lookupList('economy.agriculture_products.products', null, '{country} has these agricultural products: {list}.{end}', intent, session, callback); } 
    else if (intentName === 'bordered_by')      { lookupList('geography.land_boundaries.border_countries', 'country', '{country} is bordered by: {list}.{end}', intent, session, callback); } 
    else if (intentName === 'industries')       { lookupList('economy.industries.industries', null, '{country} has these industries: {list}.{end}', intent, session, callback); } 
    else if (intentName === 'languages')        { lookupList('people.languages.language', 'name', 'In {country}, these languages are spoken: {list}.{end}', intent, session, callback); } 
    else if (intentName === 'major_cities')     { lookupList('people.major_urban_areas.places', 'place', 'In {country}, the major urban areas are: {list}.{end}', intent, session, callback); } 
    else if (intentName === 'resources')        { lookupList('geography.natural_resources.resources', null, '{country} has these natural resources: {list}.{end}', intent, session, callback); } 

    else if (intentName === 'area_highest')     { lookupTop('report-area-highest', 'The countries with the largest area are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'area_lowest')      { lookupTop('report-area-lowest', 'The countries with the smallest area are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'exports_highest')  { lookupTop('report-exports-highest', 'The countries with the highest exports are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'exports_lowest')   { lookupTop('report-exports-lowest', 'The countries with the lowest exports are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'imports_highest')  { lookupTop('report-imports-highest', 'The countries with the highest imports are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'imports_lowest')   { lookupTop('report-imports-lowest', 'The countries with the lowest imports are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'internet_users_most') { lookupTop('report-internet-users-highest', 'The countries with the most Internet users are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 

    else if (intentName === 'population_highest') { lookupTop('report-population-highest', 'The countries with the highest population are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 
    else if (intentName === 'population_lowest') { lookupTop('report-population-lowest', 'The countries with the lowest population are {country1}, {country2}, and {country3}.{end}', intent, session, callback); } 

    else if (intentName === 'AMAZON.HelpIntent') { help(intent, session, callback); } 
    else if (intentName === 'AMAZON.StopIntent' || intentName === 'AMAZON.CancelIntent') { handleSessionEndRequest(callback); }
    else if (intentName === 'AMAZON.FallbackIntent') { huh(intent, session, callback); }

    else { throw new Error('Invalid intent: ' + intentName); }
}
onIntent function

Now let's take a look at our three primary handlers: lookup, lookupList, and lookupTop.

lookup


lookup is called to get a scalar value, such as the people.population.total value in the JSON below.

JSON snippet - Total Population for Greece

We can call the lookup function with the following path to retrieve the population total:

lookup('people.population.total', 'The population of {country} is {value}.{end}', intent, session, callback);

The lookup function is passed a JSON document path; a response template; and intent, session, and callback variables. The intent can be inspected, the session can be used to store or retrieve session state, and the callback is invoked to return a response. Here's the code to lookup:
// --------------- lookup - look up a value - e.g. What is the {property} of {country}? | What is the population of France?
// path: a dotted path to the data, as in people.population.total

function lookup(path, response, intent, session, callback) {

    const repromptText = 'Please ask me a country fact, such as "What is the population of France"?';
    const sessionAttributes = {};
    let shouldEndSession = false;
    let speechOutput = '';
    FlagImageUrl = null;
    MapImageUrl = null;
    let cardTitle = 'World Country Data';
    
    let country = normalizeCountryName(intent.slots.country.value);
    
    if (!inCountryList(country)) { // also sets FlagImageUrl & MapImageUrl if country name recognized
        console.log('lookup: country not recognized: ' + country);
        cardTitle = 'World Country Data - Country Not Recognized';
        speechOutput = 'Sorry, I don\'t recognize that country name. ' + Tag;
        callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
    }
    else {
        cardTitle = 'World Country Data - ' + country;
        const AWS = require('aws-sdk');
        AWS.config.update({region: 'us-east-1'});
        const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 
    
        var params = {
          TableName: 'factbook',
          ExpressionAttributeNames: {
             '#name': 'name',
             '#source': 'source'
          },
          ExpressionAttributeValues: {
            ':name': country,
            ':source': 'Factbook'
          },
          KeyConditionExpression: '#name = :name and #source = :source'
        };
        
        docClient.query(params, function(err, data) {
    
            if (err) { 
                console.log('lookup: Query Error - ' + err.toString());
                speechOutput = 'Sorry, an error occurred looking up the data.';
                shouldEndSession = true;
            } else { 
                if (!data || data.Items.length===0) {
                    speechOutput = 'Sorry, I found no data. ' + Tag;
                }
                else {
                    try {
                        response = replace(response, '{country}', country);
                        response = replace(response, '{value}', eval('data.Items[0].' + path));
                        response = replace(response, '{end}', Tag);
                        response = replace(response, '{audio}', '<audio src="https://s3.amazonaws.com/factbookaudio/' + replace(country, ' ', '+') + '.mp3"/>');
                        speechOutput = response;
                    }
                    catch(e) {
                        console.log('lookup: Exception - ' + e.toString());
                        speechOutput = "Sorry, I had a problem looking that up.";
                        shouldEndSession = true;
                    }
                }
                    
            // Setting repromptText to null signifies that we do not want to reprompt the user.
            // If the user does not respond or says something that is not understood, the session
            // will end.
            callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
            }
        });
    }
}
lookup function

The above code calls normalizeCountryName to normalize the country name (line 14). Many countries have alternate or historical names a user might refer to, and we need to arrive at a standard name that will match the JSON record in DynamoDB and the filenames in S3. For example, a user can request data on America, the U.S., or the U.S.A. and the country name will be normalized to United States.

Next (lines 16-21) we check whether the country name is in our list of country names. If it isn't, we won't be able to retrieve the requested data and we respond Sorry, I don't recognize that country name. The response speech is routed back by calling buildSpeechletResponse, which we'll review later.

If we do have a recognized country name, it's time to retrieve the requested data. A DynamoDB client is instantiated (lines 24-26), query parameters are set up to retrieve the JSON country record (lines 28-39), and the JSON document is retrieved (lines 41-71). The code uses JavaScript's eval function to get to the data, which isn't a great practice; we'll be looking to rewrite that code in the near future and not use eval.

We've been passing spoken responses and referring to them as templates; this is something I created, not an Alexa feature. To return the spoken response, the template response text passed in has values replaced:

  • {country} is replaced with the normalized country name;
  • {value} is replaced with the desired value by evaluating the property path that was passed in to the function.
  • {audio} is replaced with markup for the national anthem sound clip. 
  • {end} is replaced with a 1-second pause and "What else would you like to know?" prompt.
The response is passed back to Alexa by calling buildSpeechletResponse (line 69).

The response to What is the population of Greece? is The population of Greece is 10 million 761 thousand 523.

lookupList


lookupList is very similar to lookup, but has different code for retrieving the value. Instead of a single property, an array must be iterated through with an inner property extracted to speak. Here's an example of the JSON array for languages in the JSON record for Spain. In this case, the path to the array is people.languages.language but the property to be extracted from each array element is name.


We can call lookupList with the following parameters to extract a list of languages:

lookupList('people.languages.language', 'name', 'In {country}, these languages are spoken: {list}.{end}', intent, session, callback);

Here's the code to lookupList:
// --------------- lookupList - look up a list - e.g. What languages are spoken in {country}? | What ethnic groups live in France?

// path: a dotted path to array data, as in people.languages.language
// property: property of a list item to vocalize - ex: name

function lookupList(path, property, response, intent, session, callback) {

    const repromptText = 'Please ask me a country fact, such as "What is the population of France"?';
    const sessionAttributes = {};
    let shouldEndSession = false;
    let speechOutput = '';
    FlagImageUrl = null;
    MapImageUrl = null;
    let cardTitle = 'World Country Data';
    
    let country = normalizeCountryName(intent.slots.country.value);
    
    if (!inCountryList(country)) { // also sets FlagImageUrl & MapImageUrl if country name recognized
        console.log('lookupList: country not recognized: ' + country);
        cardTitle = 'World Country Data - Country Not Recognized';
        speechOutput = 'Sorry, I don\'t recognize that country name. ' + Tag;
        callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
    }
    else {
        cardTitle = 'World Country Data - ' + country;
        const AWS = require('aws-sdk');
        AWS.config.update({region: 'us-east-1'});
        const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 
    
        var params = {
          TableName: 'factbook',
          ExpressionAttributeNames: {
             '#name': 'name',
             '#source': 'source'
          },
          ExpressionAttributeValues: {
            ':name': country,
            ':source': 'Factbook'
          },
          KeyConditionExpression: '#name = :name and #source = :source'
        };
        
        docClient.query(params, function(err, data) {
    
            if (err) { 
                console.log('lookupList: Query Error - ' + err.toString());
                speechOutput = 'Sorry, an error occurred looking up the data.';
                shouldEndSession = true;
            } else { 

                if (!data || data.Items.length===0) {
                    speechOutput = 'Sorry, I found no data. ' + Tag;
                }
                else {
                    try {
                        var listText = '';
                        var listData =  eval('data.Items[0].' + path);
                        if (listData != '') {
                            for (var i = 0; i < listData.length; i++) {
                                if (i > 0) {
                                    listText += ", ";
                                }
                                if (property===null) {
                                    listText += listData[i];
                                }
                                else {
                                    listText += listData[i][property];
                                }
                            }
                        }
                        
                        response = replace(response, '{country}', country);
                        response = replace(response, '{list}', listText);
                        response = replace(response, '{end}', Tag);
                        response = replace(response, '{audio}', '<audio src="https://s3.amazonaws.com/factbookaudio/' + replace(country, ' ', '+') + '.mp3"/>');
                        speechOutput = response;
                    }
                    catch(e) {
                        console.log('lookupList: Exception - ' + e.toString());
                        speechOutput = "Sorry, I had a problem looking that up. ";
                        shouldEndSession = true;
                    }
                }
                    
            // Setting repromptText to null signifies that we do not want to reprompt the user.
            // If the user does not respond or says something that is not understood, the session
            // will end.
            callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
            }
        });
    }
}
lookupList function

lookupList is nearly identical to lookup. It also normalizes and validates the country name, and retrieves the JSON country record from DynamoDB. The big difference is lines 56-70 where the code iterates through the array at path and extracts variable property from each array element to form a list to speak. The response template can specify {list} as a placeholder for the list.

The spoken response to What languages are spoken in Spain? is In Spain, these languages are spoken: Castilian Spanish, Catalan, Galician, Basque, Aranese along with Catalan, speakers.

listTop


When a user asks a country ranking question like "which countries are largest?", the listTop function handles the request. listTop runs a query to get a Top 10 country list, such as the 10 countries with largest area. These are the same queries we used to get column charts in the web site in Part 2.

We can call lookupTop with the following parameters to hear the top countries in a category:

lookupTop('report-area-highest', 'The countries with the largest area are {country1}, {country2}, and {country3}.{end}', intent, session, callback);

Here's the code to listTop. The report name parameter is used to set the index, projection expression, and sort direction for the DynamoDB query. The response template may specify {country1}, {country2}, and {country3} for the names of the top 3 countries in the result.
// --------- lookupTop - look up top (leading) countries for a report - e.g. which country has the highest exports?: which countries are biggest?
// report: report name, such as 'report-exports-highest'
// response: speach to output, which may include embeddeed placeholders {country1}, {country2}, {country3}

function lookupTop(report, response, intent, session, callback) {

    const repromptText = 'Please ask me a country fact, such as "What is the population of France"?';
    const sessionAttributes = {};
    let shouldEndSession = false;
    let speechOutput = '';

    var index = null;
    var projectionExpression = null;
    var scanIndexForward = true;
    
    var FlagImageUrl = null;
    var MapImageUrl = null;
    const cardTitle = 'World Country Data';
    
    switch(report) {
        case 'report-area-highest':
            index = 'rank-area-index';
            projectionExpression = "#name, global_rank_area, global_value_area";
            scanIndexForward = true;
            break;
        case 'report-area-lowest':
            index = 'rank-area-index';
            projectionExpression = "#name, global_rank_area, global_value_area";
            scanIndexForward = false;
            break;
        case 'report-exports-highest':
            index = 'rank-exports-index';
            projectionExpression = "#name, global_rank_exports, global_value_exports";
            scanIndexForward = true;
            break;
        case 'report-exports-lowest':
            index = 'rank-exports-index';
            projectionExpression = "#name, global_rank_exports, global_value_exports";
            scanIndexForward = false;
            break;
        case 'report-imports-highest':
            index = 'rank-imports-index';
            projectionExpression = "#name, global_rank_imports, global_value_imports";
            scanIndexForward = true;
            break;
        case 'report-imports-lowest':
            index = 'rank-imports-index';
            projectionExpression = "#name, global_rank_imports, global_value_imports";
            scanIndexForward = false;
            break;
        case 'report-internet-users-highest':
            index = 'rank-internet-users-index';
            projectionExpression = "#name, global_rank_internet_users, global_value_internet_users";
            scanIndexForward = true;
            break;
        case 'report-population-highest':
            index = 'rank-population-index';
            projectionExpression = "#name, global_rank_population, global_value_population";
            scanIndexForward = true;
            break;
        case 'report-population-lowest':
            index = 'rank-population-index';
            projectionExpression = "#name, global_rank_population, global_value_population";
            scanIndexForward = false;
            break;
        default:
            report = null;
            break;
    }
    
    if (report===null) {
        speechOutput = "Sorry, I could not find that data. ";
        shouldEndSession = true;
        callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
    }
    else {
        const AWS = require('aws-sdk');
        AWS.config.update({region: 'us-east-1'});
        const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 
    
        var params = {
          TableName: 'factbook',
          IndexName: index,
          ExpressionAttributeNames: {
             '#name': 'name',
             '#source': 'source'
          },
          ExpressionAttributeValues: {
            ':source': 'Factbook',
          },
          KeyConditionExpression: '#source = :source',
          ProjectionType : "ALL",
          ProjectionExpression: projectionExpression,
          Limit: 10,
          ScanIndexForward: scanIndexForward
        };
        
        docClient.query(params, function(err, data) {
    
            if(err) { 
                console.log('lookupTop: Error - ' + err.toString());
                 speechOutput = 'Sorry, an error occurred looking that up.';
                 shouldEndSession = true;
            } else { 
                if (!data || data.Items.length===0) {
                    speechOutput = 'Sorry, I found no data. ' + Tag;
                }
                else {
                    try {
                        response = replace(response, '{country1}', data.Items[0].name);
                        response = replace(response, '{country2}', data.Items[1].name);
                        response = replace(response, '{country3}', data.Items[2].name);
                        response = replace(response, '{end}', Tag);
                        speechOutput = response;
                    }
                    catch(e) {
                        console.log('lookupTop: Exception - ' + e.toString());
                        speechOutput = "Sorry, I had a problem looking that up.";
                    }
                }
                    
            // Setting repromptText to null signifies that we do not want to reprompt the user.
            // If the user does not respond or says something that is not understood, the session
            // will end.
            callback(sessionAttributes, buildSpeechletResponse(cardTitle, speechOutput, repromptText, shouldEndSession));
            }
      });
    }
}
listTop Function

The response to Which countries are largest? is The countries with the largest area are Russia, Antarctica, and Canada.


buildSpeechletResponse


We've made several references to a buildSpeechletResponse function, which assembles the response to send back to Alexa. Here's the code to buildSpeechletResponse:

// --------------- Helpers that build all of the responses -----------------------

function buildSpeechletResponse(title, output, repromptText, shouldEndSession) {
    
    var outputSpeech = null;
    
    var cardText = replace(output, '<break time="1s"/>', '\r\n\r\n');
    var pos = cardText.indexOf('<audio');
    if (pos != -1) cardText = 'Playing National Anthem\r\n\r\nWhat else would you like to know?';
    
    if (output.indexOf('<') != -1) {
        outputSpeech = {   // output contains markup (audio, breaks) - output SSML
            type: 'SSML',
            ssml: '<speak>' + output + '</speak>',
        };   
    }
    else {
        outputSpeech = {  // output is just text
            type: 'PlainText',
            text: output
        };
    }

    return {
        outputSpeech: outputSpeech,
        card: {
            type: 'Standard',
            title: `${title}`,
            text: cardText,
            content: `SessionSpeechlet - ${output}`,
            image: {
                "smallImageUrl": FlagImageUrl,
                "largeImageUrl": MapImageUrl
            }
        },
        reprompt: {
            outputSpeech: {
                type: 'PlainText',
                text: repromptText,
            },
        },
        shouldEndSession,
    };
}
buildSpeechletResponse Function

The above code assembles a repsonse that includes an outputSpeech object. In the original sample used as a starting point for this project, outSpeech was just text with a type of 'PlainText'. However, our responses sometimes include <break time="1s"> markup to pause a second; and <audio ...=""> tags to play national anthems. For these reason our response is of type SSML

Certification

Having put the work into creating this skill, I decided to submit it for certification so anyone could use it. This was my first time going through the certification process, and I was pleased to find it a smooth process.

The first step was to polish the voice app as much as I could. Alexa voice apps are easy to get started, but getting them to a good production-ready state is another thing; it requires thinking through all the different ways someone might express an intent and a lot of testing. In particular, it requires getting input from multiple people since you won't think of everything yourself. After convincing my family to assist me with testing, I felt I was ready for my first submission.

The Amazon developer console takes you through a Distribution area for describing your app and answering some questions about it; you can then run a validation and automated test to pick up some low-hanging fruit about areas that need attention. When you're past all that you can submit for certification, then sit back and await feedback email. I submitted my first attempt on a Sunday evening and when I went to my computer Monday morning feedback was waiting for me. There were just two very reasonable issues to address, explained in a helpful way.

One issue had to do with my responses. I'd designed the app to stay open until you expressly tell it to exit; the reasoning being that if you're getting country facts, you probably want to ask a series of questions. The feedback said I could only do that if my responses prompted the user for something more. That was easy to address: now when a question is answered, there's a one-second pause followed by "What else would you like to know?"

The second issue had to do with cards, which is what Alexa displays when used from a device with a display (such as my TV with Amazon FireStick). The default sample app I used as a starting point output very technical titles like 'SpeeachApplet'. I overhaued the card output, and now a card is display that includes a friendly-worded title; a map of the country being described (or its flag on small-size display); and the text of the response. Here for example is the card displayed in response to Where is French Polynesia?

Card Response to "Where is French Polynesia?"

I resubmitted Monday mid-morning, and pass certification overnight. All in all, a satisfying certification process. Here's what the skill listing looks like on Amazon.com:


In Conclusion

In this series, I showed how public-domain data from the CIA World Factbook can be hosted on Amazon Web Services. After bringing that data into DynamoDB and creating a Lambda Function serverless API, and then creating a web site, we tackled an Alexa Skill in this final part of the series. You can access the skill with "Alexa, Open World Country Data".

Our skill was able to answer questions about specific values for a country, such as its population or area; as well as lists such as the languages spoken in a country. Country ranking inquiries can also be made, such as which countries have the lowest exports or highest inflation. Although we covered many data points, there's a great deal more that can be mined out of this data source.


Friday, February 22, 2019

CIA World Factbook Data on AWS, Part 2: Front-end API & Web Site using Lambda Functions and DynamoDB

In this 3-part series, I'm going to show how you can take CIA World Factbook data and use it for your own purposes on Amazon Web Services.


Previously in Part 1 we noted the CIA World Factbook data is public domain. We created the back-end to collect data and store it in DynamoDB and S3 storage, using a Lambda Function to insert document records. We also created some rudimentary Lambda Functions for accessing country records.

Today in Part 2 we will create the front-end, which will include a fuller API and a web site—powered by Lambda Functions and DynamoDB. With that, we'll finally be able to access and use all that data we collected. You can access the web site at http://world-factbook.aws.davidpallmann.com.

  

What We're Building

Today we'll be creating three things to form our front-end:
  1. API / Lambda Functions. We're going to use an API of Lambda Functions to query the DynamoDB. We'll need functions to look up country data, perform searches, and retrieve chart data.
  2. DynamoDB Secondary Indexes. We'll need to add additional indices to our DynamoDB database in order to support chart data retrieval.
  3. Web Site. We'll create a web site that allows browsing, searching, and viewing charts of world country data via the API. This site will work on both desktop and mobile devices.
Our web site can do 3 things, and we'll address each one in a section of this post:
  • Country View: select a country to view its record details (geography, people, economy, etc.)
  • Search: enter a search term and get a list of matching countries.
  • Charts: select a chart and view a column chart.

Web Site Foundation

In a prior series, I hosted this data on Microsoft Azure and created a statically-hosted web site. We're going to host the same web site for the AWS edition, but I've refactored the web site code so that it more easily supports either cloud platform with common source code. 

I've also decided to change the background map image (also public domain) and color theme for the AWS site. The original Azure site had an Amber-Sienna color theme; the AWS site will have a blue color schema. Here's how the two web sites compare:

Azure Edition of Web Site

AWS Edition of Web Site

Here's how I've structured the site, as a small number of files that can be hosted in inexpensive cloud storage. The majority of the code, markup, and style rules are common; only four small files are deployment-specific: favicon.ico, logo.png, theme.css, and cloud.js. 

File  Description  Common or Unique
cloud.js  Platform-specific functions and properties  Unique
favicon.ico  Browser icon  Unique
index.html  Web page  Common
logo.png  Logo  Unique
site.css  Style rules  Common
site.js  JavaScript  Common
theme.css  Color theme  Unique
world-map.jpg  Background image  Unique

Since I've previously blogged about the site, I'm not going to do a detailed walk-through of the code. However, I will highlight some critical parts of it. The full code can be accessed on github (see link at end of post).

When you access the web site, you may experience a short delay initially; that's because Lambda Functions power the site, and if inactive (a cold start) you'll experience a few seconds wait while the function is deployed. There are ways to keep the functions warm, but as this is just a demonstration I opted for a lowest cost deployment.

Country View

The user can select a country from the drop-down list of 260 countries at top right.


When a country is selected, the Lamba Function country is called which returns a complete JSON country record. We wrote this function  and studied the country JSON in Part 1; here's an updated view of the code.
exports.handler = function(event, context, callback) {

    const AWS = require('aws-sdk');
    AWS.config.update({region: 'us-east-1'});
    const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 
    
    var corsHeaders = {
                            "Access-Control-Allow-Origin" : "*",
                            "Access-Control-Allow-Credentials" : true
                    };

    var countryName = null;
    
    if (event && event.queryStringParameters && event.queryStringParameters.name) countryName = event.queryStringParameters.name;
    
    if (!countryName) {
        callback(null, { statusCode: 200, headers: corsHeaders, body: 'Missing parameter: name' });
    }

    var params = {
      TableName: 'factbook',
      ExpressionAttributeNames: {
         '#name': 'name',
         '#source': 'source'
      },
      ExpressionAttributeValues: {
        ':name': countryName,
        ':source': 'Factbook'
      },
      KeyConditionExpression: '#name = :name and #source = :source',
    };
    
    docClient.query(params, function(err, data) {

    if(err) { 
        console.log('03 err:')
        console.log(err.toString());
        callback(err, { statusCode: 500, headers: corsHeaders, body: 'Error: ${err}' });
    } else { 
        if (!data || data.Items.length===0) {
            callback(null, { statusCode: 400, headers: corsHeaders, body: 'Country not found: ' + countryName });
        }
        else {
            callback(null, {
                    headers: corsHeaders,
                    body: JSON.stringify(data.Items[0])
                });
        }
    }
  });
};
country Lambda Function

To be able to call this function from our web site without Cross-Original Resource Sharing (CORS) errors, we also had to go to API Gateway configuration for our country-API and Enable CORS.

Enabling CORS in API Gateway

After retrieving the country JSON from the country function, the site then populates the accordion content sections (Introduction, Geography, People, Government, Economy, Energy, Communications, Transportation, Military and Security,and Transnational Issues). Although we're getting and displaying many fields, it's only a fraction of what's in the data; over time, we'll try to expand it.

Although the country JSON document structure is consistent, any particular element we want to access may or may not be present. Accordingly, our JavaScript code to create content sections has to carefully check whether elements exist. Below is a code fragement showing how the Person section of content is assembled. We did not use a framework like Angular or React to do this (although we may at some point); even so we would have had to use the same logic and checks in our HTML template.
// Load content: People

var people = '';
if (data.people) {
    if (data.people.population && data.people.population.total) {
        people += '<div class="item"><b>Population</b><br/>' + numberWithCommas(data.people.population.total) + '</div>';
    }
    if (data.people.population && data.people.population.rank) {
        people += '<div class="item"><b>Global Rank</b><br/>' + data.people.population.global_rank + '</div>';
    }
    if (data.people.nationality && data.people.nationality.adjective) {
        people += '<div class="item"><b>Nationality</b><br/>' + data.people.nationality.adjective + '</div>';
    }
    if (data.people.ethnic_groups && data.people.ethnic_groups.ethnicity) {
        var ethnic_groups = data.people.ethnic_groups;
        people += '<div class="item"><b>Ethnic Groups</b><br/>'
        for (var i = 0; i < ethnic_groups.ethnicity.length; i++) {
            var pct = ethnic_groups.ethnicity[i].percent;
            var elem = ethnic_groups.ethnicity[i].name;
            var note = ethnic_groups.ethnicity[i].note;
            if (pct)
                elem += " (" + pct + '%)';
            if (note)
                elem += " note: " + note;
            if (i == 0)
                people += elem;
            else
                people += ", " + elem;
        }
        people += '</div>';
    }
    if (data.people.languages && data.people.languages.language) {
        var languages = data.people.languages;
        people += '<div class="item"><b>Languages</b><br/>'
        for (var i = 0; i < languages.language.length; i++) {
            var pct = languages.language[i].percent;
            var elem = languages.language[i].name;
            if (pct)
                elem += " (" + pct + '%)';
            if (i == 0)
                people += elem;
            else
                people += ", " + elem;
        }
        people += '</div>';
    }
    if (data.people.religions && data.people.religions.religion) {
        var religions = data.people.religions;
        people += '<div class="item"><b>Religions</b><br/>'
        for (var i = 0; i < religions.religion.length; i++) {
            var pct = religions.religion[i].percent;
            var elem = religions.religion[i].name;
            if (pct)
                elem += " (" + pct + '%)';
            if (i == 0)
                people += elem;
            else
                people += ", " + elem;
        }
        people += '</div>';
    }
    if (data.people.life_expectancy_at_birth && data.people.life_expectancy_at_birth.total_population && data.people.life_expectancy_at_birth.total_population.value && data.people.life_expectancy_at_birth.total_population.units) {
        people += '<div class="item"><b>Life Expectancy at Birth</b><br/>' + data.people.life_expectancy_at_birth.total_population.value + ' ' + data.people.life_expectancy_at_birth.total_population.units + '</div>';
    }
    if (data.people.population_growth_rate && data.people.population_growth_rate.growth_rate && data.people.population_growth_rate.date) {
        people += '<div class="item"><b>Population Growth Rate</b><br/>' + data.people.population_growth_rate.growth_rate + ' (' + data.people.population_growth_rate.date + ')</div>';
    }
    if (data.people.birth_rate && data.people.birth_rate.births_per_1000_population && data.people.birth_rate.date) {
        people += '<div class="item"><b>Birth Rate</b><br/>' + data.people.birth_rate.births_per_1000_population + ' births per thousand (' + data.people.birth_rate.date + ')</div>';
    }
    if (data.people.death_rate && data.people.death_rate.deaths_per_1000_population && data.people.death_rate.date) {
        people += '<div class="item"><b>Death Rate</b><br/>' + data.people.death_rate.deaths_per_1000_population + ' deaths per thousand (' + data.people.death_rate.date + ')</div>';
    }
    if (data.people.demographic_profile) {
        people += '<div class="item"><b>Demographic Profile</b><br/>' + data.people.demographic_profile + '</div>';
    }
}
$('#content-people').html(people);
JavaScript to Extract People Content

If no accordion sections were already open when the content loads, the Introduction section expands. The user can then review the data. In addition to the JSON country record, there are image files for the country flag and a map; these are retrieved from S3 storage.


Country View in Web Site

Search

Searching the data is tough: we don't have a full-text search capability at present. Fortunately, a text search would likely target either the country name or one of the large text briefs in the data, such as introduction.background or government.overview. Accordingly, our first implementation of search will use a Lambda Function that looks for a match in select fields of the country record.

Below is our search Lambda Function, which accepts a term and returns an array of matching country names and keys.
exports.handler = function(event, context, callback) {

    const AWS = require('aws-sdk');
    AWS.config.update({region: 'us-east-1'});
    const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 
    
    var corsHeaders = {
                        "Access-Control-Allow-Origin" : "*", // Required for CORS support to work
                        "Access-Control-Allow-Credentials" : true // Required for cookies, authorization headers with HTTPS 
                    };

    var testing = false;

    var term = null;

    if (testing) {
        term = 'island';
    }
    else {
        try {
            term = event.queryStringParameters.term;
        }
        catch(e) { }
    }
    
    if (term===null || term===undefined || term==='') {
        callback(null, { statusCode: 400, headers: corsHeaders, body: 'Missing parameter: term' });
        return;
    }

    var params = {
      TableName: 'factbook',
      ExpressionAttributeNames: {
         '#key': 'key',
         '#name': 'name',
         '#source': 'source'
      },
      ExpressionAttributeValues: {
        ':term': term,
        ':source': 'Factbook'
      },
      KeyConditionExpression: '#source = :source',
      FilterExpression: 'contains(#key, :term) or contains(introduction.background, :term) or contains(geography.climate, :term) or contains(geography.terrain, :term) or contains(people.demographic_profile, :term) or contains(economy.overview, :term) or contains(geography.map_reference, :term) or contains(government.government_type, :term) or contains(transnational_issues.disputes[0], :term)',
      ProjectionExpression: '#name, #key'
    };

    docClient.query(params, function(err, data) {

    if(err) { 
        console.log('03 err:')
        console.log(err.toString());
        callback(err, { statusCode: 500, headers: corsHeaders, body: 'Error: ${err}' });
    } else { 
        callback(null, {
                    headers: corsHeaders,
                    body: JSON.stringify(data.Items)
                });
    }
  });
};
search Lambda Function

Our function will be HTTP-triggered via AWS API Gateway and we'll write it in Node.js. The function code initializes a DynamoDB document client (lines 3-5), extracts the expected term query string parameter (lines 14-29), sets up query parameters (lines 31-45), and executes the query (lines 47-59).

The query parameter of interest is FilterExpression, which is a long series of contains(field, :term) sequences connected with OR operators. The DynamoDB contains operator is case-sensitive, putting a burden on the user. This "poor man's search" will be adequate for the time being, but we'll want to come back and improve on this at a later time. Ideally, a user should be able to search the entire country record with a full-text, case-insensitive search.

Here's an example of what search returns when given the term "island":
[
{
key: "american_samoa",
name: "American Samoa"
},
{
key: "anguilla",
name: "Anguilla"
},
{
key: "antarctica",
name: "Antarctica"
},
{
key: "antigua_and_barbuda",
name: "Antigua And Barbuda"
},
{
key: "aruba",
name: "Aruba"
},
{
key: "ashmore_and_cartier_islands",
name: "Ashmore And Cartier Islands"
},
{
key: "bahamas_the",
name: "Bahamas, The"
},
{
key: "barbados",
name: "Barbados"
},
{
key: "bermuda",
name: "Bermuda"
},
{
key: "bouvet_island",
name: "Bouvet Island"
},
{
key: "british_indian_ocean_territory",
name: "British Indian Ocean Territory"
},
{
key: "british_virgin_islands",
name: "British Virgin Islands"
}
]
search Function Results

Our web site's JavaScript code for search is below. An Ajax call is made to the Lambda Function, and the results are iterated to come up with a search results list of country names and flags.
// Perform a search.

function search() {

    var term = $('#search-text').val();
    if (!term) return;

    $("body").css("cursor", "progress");
    $('#loading').css('visibility', 'visible');

    $('#country').val('');
    $('#chart-select').val('');

    $('h2').removeClass('optional');

    $('#country-view').css('visibility', 'collapse');
    inCountryView = false;

    var url = cloud.searchUrl(term);

    $.ajax({
        type: 'GET',
        url: url,
        accepts: "json",
    }).done(function (response) {

        var results = cloud.resultToJson(response);

        var html = '<table id="results-table" style="color: white; font-size: 20px">';
        var count = 0;
        var countryKey = null;
        if (results) {
            for (var i = 0; i < results.length; i++) {
                countryKey = CountryKey(results[i].name);
                html += '<tr style="cursor: pointer; height: 24px; border-bottom: solid 1px white" onclick="selectCountry(' + "'" + results[i].name + "'" + ');">';
                if (haveFlag(results[i].name)) {
                    var flagImageUrl = cloud.flagImageUrl(countryKey);
                    html += '<td style="text-align: right"><img class="content-image-thumbnail" src="' + flagImageUrl + '"></td>';
                }
                else {
                    html += '<td> </td>';
                }
                html += '<td>  </td><td style="vertical-align: middle">' + results[i].name + '</td></tr>';
                count++;
            }
        }
        if (count == 0) {
            html += '<tr><td>No matches</td></tr>';
        }
        html += '</table>';

        $('#results-list').html(html);
        $('#country-flag').css('visibility', 'collapse');
        $('#chart-view').css('visibility', 'collapse');
        $('#results-view').css('visibility', 'visible');

        $('#loading').css('visibility', 'collapse');
        $("body").css("cursor", "default");
    });
}
JavaScript search code

Putting it all together, a search on the web site looks like this. With the search results displayed, the user may click on any country in the list; if they do, a Country View takes place just as if they had selected the country from the top right drop-down.

Search in Web Site

Charts

Lastly, our web site offers chart views. The user can select a chart from the list and see a chart showing comparative country data. The site currently provides these charts:
  • Area - Largest
  • Area - Smallest
  • Exports - Highest
  • Exports - Lowest
  • Imports - Highest
  • Imports - Lowest
  • Inflation - Highest
  • Inflation - Lowest
  • Internet Users - Most
  • Population - Highest
  • Population - Lowest
Each chart provides a list of 10 countries rendered as a column chart using Google Charts.

Although the number of documents we have in DynamoDB is small (260), we nevertheless want to follow good practices that would also work well with data at large scale. In the country JSON, there are properties deep in the document that list a country's global rank for area, exports, Internet users, etc. All we need to do, then, is sort by the particular global rank we're interested in and take the top 10 results.

Adding a Secondary Index to DynamoDB

In DynamoDB, you can't specify an order in your query other than to use the sort order of an index (ascending or descending). So, if we wanted to list countries by order of area global rank, we'd want to order by the area global rank and plot the area in square km in our chart. Here's where they are in the country JSON:


What we need to do, conceptually, is create a Secondary Index with a sort key of geography.area.global_rank that also includes the country name (name) and actual area (geography.area.total.value). Unfortunately, you can only include top-level properties in a Secondary Index so we can't do it exactly that way...

What we can do is promote these properties—and their brethren for the other charts—to be top-most properties. We do that by modifying the load-country Lambda Function we created in Part 1 to surface these new top-level properties. Here's the code we inserted:
if (data != null) 
{
// add 3 fields to the document

data.key = key; // countryKey(data.name);
data.timestamp = 'Monday, February 11, 2019 4:09:28 PM';
data.source = 'Factbook';

// promote fields to top that we need to index on

if (data.geography && data.geography.area && data.geography.area.global_rank) {
    data.global_rank_area = data.geography.area.global_rank; }
if (data.people && data.people.population &&  data.people.population.global_rank) {
    data.global_rank_population = data.people.population.global_rank; }
if (data.economy && data.economy.imports && data.economy.imports.total_value && data.economy.imports.total_value.global_rank)
    data.global_rank_imports = data.economy.imports.total_value.global_rank;
if (data.economy && data.economy.exports && data.economy.exports.total_value && data.economy.exports.total_value.global_rank)
    data.global_rank_exports = data.economy.exports.total_value.global_rank;
if (data.economy && data.economy.inflation_rate && data.economy.inflation_rate.global_rank)
    data.global_rank_inflation_rate = data.economy.inflation_rate.global_rank;
if (data.communications && data.communications.internet && data.communications.internet.users && data.communications.internet.users.global_rank)
    data.global_rank_internet_users = data.communications.internet.users.global_rank;
if (data.geography && data.geography.area && data.geography.area.total && data.geography.area.total.value)
    data.global_value_area = data.geography.area.total.value;
if (data.people && data.people.population &&  data.people.population.total)
    data.global_value_population = data.people.population.total;
if (data.economy && data.economy.imports && data.economy.imports.total_value && data.economy.imports.total_value.annual_values && data.economy.imports.total_value.annual_values[0] && data.economy.imports.total_value.annual_values[0].value)
    data.global_value_imports = data.economy.imports.total_value.annual_values[0].value;
if (data.economy && data.economy.exports && data.economy.exports.total_value && data.economy.exports.total_value.annual_values && data.economy.exports.total_value.annual_values[0] && data.economy.exports.total_value.annual_values[0].value)
    data.global_value_exports = data.economy.exports.total_value.annual_values[0].value;
if (data.economy && data.economy.inflation_rate && data.economy.inflation_rate.annual_values && data.economy.inflation_rate.annual_values[0] && data.economy.inflation_rate.annual_values[0].value)
    data.global_value_inflation_rate = data.economy.inflation_rate.annual_values[0].value;
if (data.communications && data.communications.internet && data.communications.internet.users && data.communications.internet.users.total)
    data.global_value_internet_users = data.communications.internet.users.total

// insert country record

var params = {
    TableName: 'factbook',
    Item: data
    };

console.log("Adding new item...");
docClient.put(params, function(err, data2) {
Code Added to load-country Function

With this update, and after re-loading all the country records into DocumentDB, we now have the top-level properties we need:

Updated Country JSON with Top-level Global Rank/Value Properties

Now we are able to create secondary indices on our DynamoDB database. Here's how we create the index rank-area-index. Let's take note of a few things. The partition key is source (which is always "Factbook"), same as our primary index. The sort key is global_rank_area, a number. This will make it easy to get the top N or bottom N countries by global area rank. The attributes for the index include name, global_area_rank, and global_area_value; we need the value in order to plot anything meaningful in our chart.

Creating Secondary Index

That was a bit of work; but with our index created, we can now create a Lambda Function to query by area rank. Here's report-area-highest, which returns the names, rank, and area for the top 10 countries with largest area. Notice that the query parameters specify an IndexName of rank-area-index and a ScanIndexForward (sort order) value of true. We also specify a Limit of 10, which will give us just 10 records back. For the sister report-area-lowest function, ScanIndexForward will be set to false.
// List top 10 countries with largest area

const AWS = require('aws-sdk');
AWS.config.update({region: 'us-east-1'});
var docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 

var corsHeaders = { "Access-Control-Allow-Origin" : "*", "Access-Control-Allow-Credentials" : true };

exports.handler = function(event, context, callback) {

    var params = {
      TableName: 'factbook',
      IndexName: 'rank-area-index',
      ExpressionAttributeNames: {
         '#name': 'name',
         '#source': 'source'
      },
      ExpressionAttributeValues: {
        ':source': 'Factbook',
      },
      KeyConditionExpression: '#source = :source',
      ProjectionType : "ALL",
      ProjectionExpression: "#name, global_rank_area, global_value_area",
      Limit: 10,
      ScanIndexForward: true
    };
    
    docClient.query(params, function(err, data) {

        if(err) { 
            console.log('03 err:')
            console.log(err.toString());
            callback(err, { statusCode: 500, headers: corsHeaders, body: 'Error: ${err}' });
        } else { 
            callback(null, {
                    headers: corsHeaders,
                    body: JSON.stringify(data.Items)
                });
        }
      });
};
report-area-highest Lambda Function

Here's the output when report-area-highest is run. It's just what we want: the top 10 countries with highest area, including the name and value for each country.
[
{
    global_rank_area: 1,
    name: "Russia",
    global_value_area: 17098242
},
{
    global_rank_area: 2,
    name: "Antarctica",
    global_value_area: 14000000
},
{
    global_rank_area: 3,
    name: "Canada",
    global_value_area: 9984670
},
{
    global_rank_area: 4,
    name: "United States",
    global_value_area: 9833517
},
{
    global_rank_area: 5,
    name: "China",
    global_value_area: 9596960
},
{
    global_rank_area: 6,
    name: "Brazil",
    global_value_area: 8515770
},
{
    global_rank_area: 7,
    name: "Australia",
    global_value_area: 7741220
},
{
    global_rank_area: 8,
    name: "India",
    global_value_area: 3287263
},
{
    global_rank_area: 9,
    name: "Argentina",
    global_value_area: 2780400
},
{
    global_rank_area: 10,
    name: "Kazakhstan",
    global_value_area: 2724900
}
]
report-area-highest output

When the JavaScript code in the web site plots this with Google Charts, here's what the end result is:

Chart in Web Site

Each of the other charts was implemented exactly the same way: promote the appropriate properties to the top of the JSON, add a Secondary Index to DynamoDB,and write a simple Lambda Function to query using the index. The first one was a bit of work; but once the correct pattern was identified, all the others followed in rapid succession.

In Conclusion

In this series I showed how to retrieve world country data from CIA World Factbook and store it in AWS, along with an API and web site for accessing the data. DynamoDB was our primary repository along with S3 storage, and its performed well. Lambda Functions were integral to both the back end (loading country records) and front-end (country view, search, charts) and were written in Node.js (JavaScript).

The resulting web site can be accessed at http://world-factbook.aws.davidpallmann.com. To create the web site, a prior web site from another project cloud platform was refactored so that it would work with AWS or Azure with mostly common code.

To do charting, we needed to create secondary indices which in turn required us to promote some of our JSON values to top-level properties. Once that was done, it was a breeze to create the necessary Lambda Functions. Combined with Google Charts, we quickly had charts up and running.

What we've covered in Parts 1 and 2 took two days of development and two days of blog-writing.
Cloud-native services like Lambda and DynamoDB make for rapid development.

In Part 3, we'll be creating an Alexa Skill so that our data can be accessed by voice.

Web Site Source Code on GitHub

Next: CIA World Factbook on AWS, Part 3: Alexa Voice Interface using Lambda and DynamoDB


Wednesday, February 20, 2019

CIA World Factbook Data on AWS, Part 1: Loading DynamoDB with Lambda Functions

In this 3-part series, I'm going to show how you can take CIA World Factbook data and use it for your own purposes on Amazon Web Services. Today in Part 1 we'll get the data loaded into DynamoDB with the help of a Lambda Function, and we'll also create Lambda Functions for accessing the data. Later in Part 2 we'll set up a web site for browsing and searching the data; and in Part 3 we''ll create an Alexa skill for querying by voice.

Architecture

For those who follow my blog and are feeling deja vu, I recently completed a similar series for Microsoft Azure. Having done this once already will accelerate the effort. This second time around, I'll be going less into the fine details of what we're doing and will leverage some of the prior work.

About the CIA World Factbook Data

The US Central Intelligence Agency publishes an almanac-style reference on the countries of the world known as the CIA World Factbook. There's a wealth of data, and you can learn a lot on the site. I urge you to explore it and drill into the detail. Happily, this data is in the public domain which means we can use it for our own purposes. Note however that you are not permitted to replicate the agency seal; and naturally, you should give proper attribution if you use the data.
The data on the site is not particularly approachable for software purposes, but fortunately a gentleman named Ian Coleman has seen fit to create a JSON edition of the data, which is what we'll be using as our data source. It comes as one big 14MB JSON file, but we'll divide that into a JSON record per country (260 of them).

What We're Building

Today in Part 1 we have two goals:
  1. Get the country data into a DynamoDB table.
  2. Create Lamba functions for accessing the data.
We'll concentrate first on getting our country data loaded into DynamoDB, with the help of a Lambda Function.


Country Data in Dynamo DB

Then, we'll create Lamba Functions are accessing the data at various levels:


Data Retrieval via API & Lambda Function

With the above accomplished, it will be smooth sailing to create user interfaces to the data.

Loading the Data

We want our data both in S3 and in DynamoDB. Let's take a look at our source data, a JSON country record:
{
  "name": "Bermuda",
  "key": "bermuda",
  "timestamp": "Monday, February 11, 2019 4:09:28 PM",
  "source": "Factbook",
  "introduction": {
    "background": "Bermuda was first settled in 1609 by shipwrecked English colonists heading for Virginia. Self-governing since 1620, Bermuda is the oldest and most populous of the British overseas territories. Vacationing to the island to escape North American winters first developed in Victorian times. Tourism continues to be important to the island's economy, although international business has overtaken it in recent years. Bermuda has also developed into a highly successful offshore financial center. A referendum on independence from the UK was soundly defeated in 1995."
  },
  "geography": {
    "location": "North America, group of islands in the North Atlantic Ocean, east of South Carolina (US)",
    "geographic_coordinates": {
      "latitude": {
        "degrees": 32,
        "minutes": 20,
        "hemisphere": "N"
      },
      "longitude": {
        "degrees": 64,
        "minutes": 45,
        "hemisphere": "W"
      }
    },
    "map_references": "North America",
    "area": {
      "total": {
        "value": 54,
        "units": "sq km"
      },
      "land": {
        "value": 54,
        "units": "sq km"
      },
      "water": {
        "value": 0,
        "units": "sq km"
      },
      "global_rank": 232,
      "comparative": "about one-third the size of Washington, DC"
    },
    "land_boundaries": {
      "total": {
        "value": 0,
        "units": "km"
      }
    },
    "coastline": {
      "value": 103,
      "units": "km"
    },
    "maritime_claims": {
      "territorial_sea": {
        "value": 12,
        "units": "nm"
      },
      "exclusive_fishing_zone": {
        "value": 200,
        "units": "nm"
      }
    },
    "climate": "subtropical; mild, humid; gales, strong winds common in winter",
    "terrain": "low hills separated by fertile depressions",
    "elevation": {
      "lowest_point": "Atlantic Ocean",
      "79_highest_point": "Town Hill"
    },
    "natural_resources": {
      "resources": [
        "limestone",
        "pleasant climate fostering tourism"
      ]
    },
    "land_use": {
      "by_sector": {
        "agricultural_land_total": {
          "value": 14.8,
          "units": "%"
        },
        "arable_land": {
          "value": 14.8,
          "units": "%",
          "note": "/"
        },
        "permanent_crops": {
          "value": 0,
          "units": "%",
          "note": "/"
        },
        "permanent_pasture": {
          "value": 0,
          "units": "%"
        },
        "forest": {
          "value": 20,
          "units": "%"
        },
        "other": {
          "value": 65.2,
          "units": "%"
        }
      },
      "date": "2011"
    },
    "population_distribution": "relatively even population distribution throughout",
    "natural_hazards": [
      {
        "description": "hurricanes (June to November)",
        "type": "hazard"
      }
    ],
    "environment": {
      "current_issues": [
        "dense population and heavy vehicle traffic create serious congestion and air pollution problems",
        "water resources scarce (most obtained as rainwater or from wells)",
        "solid waste disposal",
        "hazardous waste disposal",
        "sewage disposal",
        "overfishing",
        "oil spills"
      ]
    }
  },
  "people": {
    "population": {
      "total": 71176,
      "global_rank": 203,
      "date": "2018-07-01"
    },
    "nationality": {
      "noun": "Bermudian(s)",
      "adjective": "Bermudian"
    },
    "ethnic_groups": {
      "ethnicity": [
        {
          "name": "black",
          "percent": 53.8
        },
        {
          "name": "white",
          "percent": 31
        },
        {
          "name": "mixed",
          "percent": 7.5
        },
        {
          "name": "other",
          "percent": 7.1
        },
        {
          "name": "unspecified",
          "percent": 0.6
        }
      ],
      "date": "2010"
    },
    "languages": {
      "language": [
        {
          "name": "English",
          "note": "official"
        },
        {
          "name": "Portuguese"
        }
      ]
    },
    "religions": {
      "religion": [
        {
          "name": "Protestant",
          "percent": 46.2,
          "breakdown": [
            {
              "name": "includes Anglican",
              "percent": 15.8
            },
            {
              "name": "African Methodist Episcopal",
              "percent": 8.6
            },
            {
              "name": "Seventh Day Adventist",
              "percent": 6.7
            },
            {
              "name": "Pentecostal",
              "percent": 3.5
            },
            {
              "name": "Methodist",
              "percent": 2.7
            },
            {
              "name": "Presbyterian",
              "percent": 2
            },
            {
              "name": "Church of God",
              "percent": 1.6
            },
            {
              "name": "Baptist",
              "percent": 1.2
            },
            {
              "name": "Salvation Army",
              "percent": 1.1
            },
            {
              "name": "Brethren",
              "percent": 1
            },
            {
              "name": "other Protestant",
              "percent": 2
            }
          ]
        },
        {
          "name": "Roman Catholic",
          "percent": 14.5
        },
        {
          "name": "Jehovah's Witness",
          "percent": 1.3
        },
        {
          "name": "other Christian",
          "percent": 9.1
        },
        {
          "name": "Muslim",
          "percent": 1
        },
        {
          "name": "other",
          "percent": 3.9
        },
        {
          "name": "none",
          "percent": 17.8
        },
        {
          "name": "unspecified",
          "percent": 6.2
        }
      ],
      "date": "2010"
    },
    "age_structure": {
      "0_to_14": {
        "percent": 16.92,
        "males": 6088,
        "females": 5957
      },
      "15_to_24": {
        "percent": 11.95,
        "males": 4306,
        "females": 4197
      },
      "25_to_54": {
        "percent": 36.56,
        "males": 13049,
        "females": 12972
      },
      "55_to_64": {
        "percent": 16.04,
        "males": 5383,
        "females": 6034
      },
      "65_and_over": {
        "percent": 18.53,
        "males": 5596,
        "females": 7594
      },
      "date": "2018"
    },
    "median_age": {
      "total": {
        "value": 43.5,
        "units": "years"
      },
      "male": {
        "value": 41.5,
        "units": "years"
      },
      "female": {
        "value": 45.4,
        "units": "years"
      },
      "global_rank": 18,
      "date": "2018"
    },
    "population_growth_rate": {
      "growth_rate": 0.43,
      "global_rank": 158,
      "date": "2018"
    },
    "birth_rate": {
      "births_per_1000_population": 11.3,
      "global_rank": 172,
      "date": "2018"
    },
    "death_rate": {
      "deaths_per_1000_population": 8.7,
      "global_rank": 71,
      "date": "2018"
    },
    "net_migration_rate": {
      "migrants_per_1000_population": 1.8,
      "global_rank": 50,
      "date": "2017"
    },
    "population_distribution": "relatively even population distribution throughout",
    "urbanization": {
      "urban_population": {
        "value": 100,
        "units": "%",
        "date": "2018"
      },
      "rate_of_urbanization": {
        "value": -0.44,
        "units": "%"
      }
    },
    "major_urban_areas": {
      "places": [
        {
          "place": "Hamilton",
          "population": 10000,
          "is_capital": true
        }
      ],
      "date": "2018"
    },
    "sex_ratio": {
      "by_age": {
        "at_birth": {
          "value": 1.02,
          "units": "males/female"
        },
        "0_to_14_years": {
          "value": 1.02,
          "units": "males/female"
        },
        "15_to_24_years": {
          "value": 1.01,
          "units": "males/female"
        },
        "25_to_54_years": {
          "value": 1,
          "units": "males/female"
        },
        "55_to_64_years": {
          "value": 0.89,
          "units": "males/female"
        },
        "65_years_and_over": {
          "value": 0.73,
          "units": "males/female"
        }
      },
      "total_population": {
        "value": 0.94,
        "units": "males/female"
      },
      "date": "2017"
    },
    "infant_mortality_rate": {
      "total": {
        "value": 2.5,
        "units": "deaths_per_1000_live_births"
      },
      "male": {
        "value": 2.6,
        "units": "deaths_per_1000_live_births"
      },
      "female": {
        "value": 2.4,
        "units": "deaths_per_1000_live_births"
      },
      "global_rank": 217,
      "date": "2018"
    },
    "life_expectancy_at_birth": {
      "total_population": {
        "value": 81.5,
        "units": "years"
      },
      "male": {
        "value": 78.3,
        "units": "years"
      },
      "female": {
        "value": 84.7,
        "units": "years"
      },
      "global_rank": 26,
      "date": "2018"
    },
    "total_fertility_rate": {
      "children_born_per_woman": 1.92,
      "global_rank": 128,
      "date": "2018"
    },
    "education_expenditures": {
      "percent_of_gdp": 1.5,
      "global_rank": 175,
      "date": "2017"
    },
    "school_life_expectancy": {
      "total": {
        "value": 12,
        "units": "years"
      },
      "male": {
        "value": 11,
        "units": "years"
      },
      "female": {
        "value": 12,
        "units": "years"
      },
      "date": "2015"
    },
    "youth_unemployment": {
      "total": {
        "value": 29.3,
        "units": "%"
      },
      "male": {
        "value": 29.7,
        "units": "%"
      },
      "female": {
        "value": 29,
        "units": "%"
      },
      "global_rank": 35,
      "date": "2014"
    }
  },
  "government": {
    "country_name": {
      "conventional_long_form": "none",
      "conventional_short_form": "Bermuda",
      "former": "Somers Islands",
      "etymology": "the islands making up Bermuda are named after Juan de BERMUDEZ, an early 16th century Spanish sea captain and the first European explorer of the archipelago"
    },
    "government_type": "parliamentary democracy (Parliament); self-governing overseas territory of the UK",
    "capital": {
      "name": "Hamilton",
      "geographic_coordinates": {
        "latitude": {
          "degrees": 32,
          "minutes": 17,
          "hemisphere": "N"
        },
        "longitude": {
          "degrees": 64,
          "minutes": 47,
          "hemisphere": "W"
        }
      },
      "time_difference": {
        "timezone": -4,
        "note": "1 hour ahead of Washington, DC, during Standard Time"
      },
      "daylight_saving_time": "+1hr, begins second Sunday in March; ends first Sunday in November"
    },
    "administrative_divisions": [
      {
        "name": "Devonshire",
        "type": ""
      },
      {
        "name": "Hamilton",
        "type": ""
      },
      {
        "name": "Hamilton",
        "type": ""
      },
      {
        "name": "Paget",
        "type": ""
      },
      {
        "name": "Pembroke",
        "type": ""
      },
      {
        "name": "Saint George",
        "type": ""
      },
      {
        "name": "Saint George's",
        "type": ""
      },
      {
        "name": "Sandys",
        "type": ""
      },
      {
        "name": "Smith's",
        "type": ""
      },
      {
        "name": "Southampton",
        "type": ""
      },
      {
        "name": "Warwick",
        "type": ""
      }
    ],
    "independence": {
      "note": "overseas territory of the UK"
    },
    "national_holidays": [
      {
        "name": "Bermuda Day",
        "day": "24 May",
        "note": "formerly known as Victoria Day, Empire Day, and Commonwealth Day"
      }
    ],
    "constitution": {
      "history": "several previous (dating to 1684); latest entered into force 8 June 1968 (Bermuda Constitution Order 1968) (2018)",
      "amendments": "proposal procedure - NA; passage by an Order in Council in the UK; amended several times, last in 2012 (2018)"
    },
    "legal_system": "English common law",
    "international_law_organization_participation": [
      "has not submitted an ICJ jurisdiction declaration",
      "non-party state to the ICCt"
    ],
    "citizenship": {
      "citizenship_by_birth": "no",
      "citizenship_by_descent_only": "at least one parent must be a citizen of the UK",
      "dual_citizenship_recognized": "yes",
      "residency_requirement_for_naturalization": "10 years"
    },
    "suffrage": {
      "age": 18,
      "universal": true,
      "compulsory": false
    },
    "executive_branch": {
      "chief_of_state": "Queen ELIZABETH II (since 6 February 1952); represented by Governor John RANKIN (since 5 December 2016)",
      "head_of_government": "Premier David BURT (since 19 July 2017)",
      "cabinet": "Cabinet nominated by the premier, appointed by the governor",
      "elections_appointments": "the monarchy is hereditary; governor appointed by the monarch; following legislative elections, the leader of the majority party or majority coalition usually appointed premier by the governor"
    },
    "legislative_branch": {
      "description": "bicameral Parliament consists of: Senate (11 seats; 3 members appointed by the governor, 5 by the premier, and 3 by the opposition party; members serve 5-year terms) and the House of Assembly (36 seats; members directly elected in single-seat constituencies by simple majority vote to serve up to 5-year terms)\nHouse of Assembly (36 seats; members directly elected in single-seat constituencies by simple majority vote to serve up to 5-year terms)",
      "elections": "Senate - last appointments in August 2017 (next appointments in 2022)\nHouse of Assembly - last held on 18 July 2017 (next to be held not later than 2022)",
      "election_results": "Senate - composition - men 7, women 4, percent of women 36.4%\nHouse of Assembly - percent of vote by party - PLP 58.9%, OBA 40.6%, other 0.5%; seats by party - PLP 24, OBA 12; composition - men 28, women 8, percent of women 22.2%; note - total Parliament percent of women 25.5%"
    },
    "judicial_branch": {
      "highest_courts": "Court of Appeal (consists of the court president and at least 2 justices); Supreme Court (consists of the chief justice, 4 puisne judges, and 1 associate justice); note - the Judicial Committee of the Privy Council in London is the court of final appeal",
      "judge_selection_and_term_of_office": "Court of Appeal justice appointed by the governor; justice tenure by individual appointment; Supreme Court judges nominated by the Judicial and Legal Services Commission and appointed by the governor; judge tenure based on terms of appointment",
      "subordinate_courts": "commercial court (began in 2006); magistrates' courts"
    },
    "political_parties_and_leaders": {
      "parties": [
        {
          "name": "One Bermuda Alliance",
          "name_alternative": "OBA",
          "note": "vacant"
        },
        {
          "name": "Progressive Labor Party",
          "name_alternative": "PLP",
          "leaders": [
            "E. David BURT"
          ]
        }
      ]
    },
    "international_organization_participation": [
      {
        "organization": "Caricom ",
        "note": "associate"
      },
      {
        "organization": "ICC ",
        "note": "NGOs"
      },
      {
        "organization": "Interpol ",
        "note": "subbureau"
      },
      {
        "organization": "IOC"
      },
      {
        "organization": "ITUC ",
        "note": "NGOs"
      },
      {
        "organization": "UPU"
      },
      {
        "organization": "WCO"
      }
    ],
    "diplomatic_representation": {
      "from_united_states": {
        "chief_of_mission": "Consul General Mary Ellen KOENIG (since 28 November 2015)",
        "mailing_address": "P. O. Box HM325, Hamilton HMBX; American Consulate General Hamilton, US Department of State, 5300 Hamilton Place, Washington, DC 20520-5300",
        "telephone": "[1] (441) 295-1342",
        "fax": "[1] (441) 295-1592, 296-9233",
        "consulates_general": "Crown Hill, 16 Middle Road, Devonshire DVO3"
      }
    },
    "flag_description": {
      "description": "red, with the flag of the UK in the upper hoist-side quadrant and the Bermudian coat of arms (a white shield with a red lion standing on a green grassy field holding a scrolled shield showing the sinking of the ship Sea Venture off Bermuda in 1609) centered on the outer half of the flag; it was the shipwreck of the vessel, filled with English colonists originally bound for Virginia, that led to the settling of Bermuda",
      "note": "the flag is unusual in that it is only British overseas territory that uses a red ensign, all others use blue"
    },
    "national_symbol": {
      "symbols": [
        {
          "symbol": "red lion"
        }
      ]
    },
    "national_anthem": {
      "name": "Hail to Bermuda",
      "lyrics_music": "Bette JOHNS",
      "note": "serves as a local anthem; as a territory of the United Kingdom, \"God Save the Queen\" is official (see United Kingdom)"
    }
  },
  "economy": {
    "overview": "International business, which consists primarily of insurance and other financial services, is the real bedrock of Bermuda's economy, consistently accounting for about 85% of the island's GDP. Tourism is the country’s second largest industry, accounting for about 5% of Bermuda's GDP but a much larger share of employment. Over 80% of visitors come from the US and the sector struggled in the wake of the global recession of 2008-09. Even the financial sector has lost roughly 5,000 high-paying expatriate jobs since 2008, weighing heavily on household consumption and retail sales. Bermuda must import almost everything. Agriculture and industry are limited due to the small size of the island.\nBermuda's economy returned to negative growth in 2016, reporting a contraction of 0.1% GDP, after growing by 0.6% in 2015. Unemployment reached 7% in 2016 and 2017, public debt is growing and exceeds $2.4 billion, and the government continues to work on attracting foreign investment. Still, Bermuda enjoys one of the highest per capita incomes in the world.",
    "gdp": {
      "purchasing_power_parity": {
        "annual_values": [
          {
            "value": 6127000000,
            "units": "USD",
            "date": "2016"
          },
          {
            "value": 6133000000,
            "units": "USD",
            "date": "2015"
          },
          {
            "value": 6097000000,
            "units": "USD",
            "date": "2014"
          }
        ],
        "global_rank": 172
      },
      "official_exchange_rate": {
        "USD": 6127000000,
        "date": "2016"
      },
      "real_growth_rate": {
        "annual_values": [
          {
            "value": -0.1,
            "units": "%",
            "date": "2016"
          },
          {
            "value": 0.6,
            "units": "%",
            "date": "2015"
          },
          {
            "value": -0.3,
            "units": "%",
            "date": "2014"
          }
        ],
        "global_rank": 198
      },
      "per_capita_purchasing_power_parity": {
        "annual_values": [
          {
            "value": 99400,
            "units": "USD",
            "date": "2016"
          },
          {
            "value": 95500,
            "units": "USD",
            "date": "2015"
          },
          {
            "value": 87500,
            "units": "USD",
            "date": "2014"
          }
        ],
        "global_rank": 6
      },
      "composition": {
        "by_end_use": {
          "end_uses": {
            "household_consumption": {
              "value": 51.3,
              "units": "%"
            },
            "government_consumption": {
              "value": 15.7,
              "units": "%"
            },
            "investment_in_fixed_capital": {
              "value": 13.7,
              "units": "%"
            },
            "investment_in_inventories": {
              "value": 0,
              "units": "%"
            },
            "exports_of_goods_and_services": {
              "value": 49.8,
              "units": "%"
            },
            "imports_of_goods_and_services": {
              "value": -30.4,
              "units": "%"
            }
          },
          "date": "2017"
        },
        "by_sector_of_origin": {
          "sectors": {
            "agriculture": {
              "value": 0.9,
              "units": "%"
            },
            "industry": {
              "value": 5.3,
              "units": "%"
            },
            "services": {
              "value": 93.8,
              "units": "%"
            }
          },
          "date": "2017"
        }
      }
    },
    "agriculture_products": {
      "products": [
        "bananas",
        "vegetables",
        "citrus",
        "flowers",
        "dairy products",
        "honey"
      ]
    },
    "industries": {
      "industries": [
        "international business",
        "tourism",
        "light manufacturing"
      ]
    },
    "industrial_production_growth_rate": {
      "annual_percentage_increase": 2,
      "global_rank": 129,
      "date": "2017"
    },
    "labor_force": {
      "total_size": {
        "total_people": 33480,
        "global_rank": 202,
        "date": "2016"
      },
      "by_occupation": {
        "occupation": {
          "agriculture": {
            "value": 2,
            "units": "%"
          },
          "industry": {
            "value": 13,
            "units": "%"
          },
          "services": {
            "value": 85,
            "units": "%"
          }
        },
        "date": "2016"
      }
    },
    "unemployment_rate": {
      "annual_values": [
        {
          "value": 7,
          "units": "%",
          "date": "2017"
        },
        {
          "value": 7,
          "units": "%",
          "date": "2016"
        }
      ],
      "global_rank": 106
    },
    "population_below_poverty_line": {
      "value": 11,
      "units": "%",
      "date": "2008"
    },
    "household_income_by_percentage_share": {},
    "budget": {
      "revenues": {
        "value": 999200000,
        "units": "USD"
      },
      "expenditures": {
        "value": 1176000000,
        "units": "USD"
      },
      "date": "2017"
    },
    "taxes_and_other_revenues": {
      "percent_of_gdp": 16.3,
      "global_rank": 183,
      "date": "2017"
    },
    "budget_surplus_or_deficit": {
      "percent_of_gdp": -2.9,
      "global_rank": 127,
      "date": "2017"
    },
    "public_debt": {
      "annual_values": [
        {
          "value": 43,
          "units": "percent_of_gdp"
        }
      ],
      "global_rank": 117
    },
    "fiscal_year": {
      "start": "1 April",
      "end": "31 March"
    },
    "inflation_rate": {
      "annual_values": [
        {
          "value": 1.9,
          "units": "%",
          "date": "2017"
        },
        {
          "value": 1.4,
          "units": "%",
          "date": "2016"
        }
      ],
      "global_rank": 96
    },
    "stock_of_narrow_money": {
      "annual_values": [
        {
          "value": 3374000000,
          "units": "USD",
          "date": "2014-09-30"
        },
        {
          "value": 3422000000,
          "units": "USD",
          "date": "2013-12-31"
        }
      ],
      "global_rank": 118,
      "note": "figures do not include US dollars, which also circulate freely"
    },
    "stock_of_broad_money": {
      "annual_values": [
        {
          "value": 22100000000,
          "units": "USD",
          "date": "2014-09-30"
        },
        {
          "value": 25100000000,
          "units": "USD",
          "date": "2013-12-31"
        }
      ],
      "global_rank": 67
    },
    "stock_of_domestic_credit": {
      "note": "NA"
    },
    "market_value_of_publicly_traded_shares": {
      "annual_values": [
        {
          "value": 1850000000,
          "units": "USD",
          "date": "2015-12-31"
        },
        {
          "value": 1601000000,
          "units": "USD",
          "date": "2014-12-31"
        },
        {
          "value": 1467000000,
          "units": "USD",
          "date": "2013-12-31"
        }
      ],
      "global_rank": 100
    },
    "current_account_balance": {
      "annual_values": [
        {
          "value": 818600000,
          "units": "USD",
          "date": "2017"
        },
        {
          "value": 763000000,
          "units": "USD",
          "date": "2016"
        }
      ],
      "global_rank": 53
    },
    "exports": {
      "total_value": {
        "annual_values": [
          {
            "value": 19000000,
            "units": "USD",
            "date": "2017"
          },
          {
            "value": 19000000,
            "units": "USD",
            "date": "2016"
          }
        ],
        "global_rank": 210
      },
      "commodities": {
        "by_commodity": [
          "reexports of pharmaceuticals"
        ]
      },
      "partners": {
        "by_country": [
          {
            "name": "Jamaica",
            "percent": 49.1
          },
          {
            "name": "Luxembourg",
            "percent": 36.1
          },
          {
            "name": "US",
            "percent": 4.9
          }
        ],
        "date": "2017"
      }
    },
    "imports": {
      "total_value": {
        "annual_values": [
          {
            "value": 1094000000,
            "units": "USD",
            "date": "2017"
          },
          {
            "value": 980000000,
            "units": "USD",
            "date": "2016"
          }
        ],
        "global_rank": 183
      },
      "commodities": {
        "by_commodity": [
          "clothing",
          "fuels",
          "machinery",
          "transport equipment",
          "construction materials",
          "chemicals",
          "food",
          "live animals"
        ]
      },
      "partners": {
        "by_country": [
          {
            "name": "US",
            "percent": 72.1
          },
          {
            "name": "South Korea",
            "percent": 9.7
          },
          {
            "name": "Canada",
            "percent": 4.2
          }
        ],
        "date": "2017"
      }
    },
    "external_debt": {
      "annual_values": [
        {
          "value": 2515000000,
          "units": "USD",
          "date": "2017"
        },
        {
          "value": 2435000000,
          "units": "USD",
          "date": "2015"
        }
      ],
      "global_rank": 150
    },
    "stock_of_direct_foreign_investment": {
      "at_home": {
        "annual_values": [
          {
            "value": 2641000000,
            "units": "USD",
            "date": "2014"
          },
          {
            "value": 2664000000,
            "units": "USD",
            "date": "2013"
          }
        ],
        "global_rank": 116
      },
      "abroad": {
        "annual_values": [
          {
            "value": 889000000,
            "units": "USD",
            "date": "2014"
          },
          {
            "value": 835000000,
            "units": "USD",
            "date": "2013"
          }
        ],
        "global_rank": 90
      }
    },
    "exchange_rates": {
      "annual_values": [
        {
          "value": 1,
          "units": "USD",
          "date": "2017"
        },
        {
          "value": 1,
          "units": "USD",
          "date": "2016"
        },
        {
          "value": 1,
          "units": "USD",
          "date": "2015"
        },
        {
          "value": 1,
          "units": "USD",
          "date": "2014"
        },
        {
          "value": 1,
          "units": "USD",
          "date": "2013"
        }
      ],
      "note": "Bermudian dollars (BMD) per US dollar"
    }
  },
  "energy": {
    "electricity": {
      "access": {
        "total_electrification": {
          "value": 100,
          "units": "%"
        },
        "date": "2016"
      },
      "production": {
        "kWh": 650000000,
        "global_rank": 159,
        "date": "2016"
      },
      "consumption": {
        "kWh": 604500000,
        "global_rank": 166,
        "date": "2016"
      },
      "exports": {
        "kWh": 0,
        "global_rank": 107,
        "date": "2016"
      },
      "imports": {
        "kWh": 0,
        "global_rank": 126,
        "date": "2016"
      },
      "installed_generating_capacity": {
        "kW": 171000,
        "global_rank": 169,
        "date": "2016"
      },
      "by_source": {
        "fossil_fuels": {
          "percent": 100,
          "global_rank": 3,
          "date": "2016"
        },
        "nuclear_fuels": {
          "percent": 0,
          "global_rank": 50,
          "date": "2017"
        },
        "hydroelectric_plants": {
          "percent": 0,
          "global_rank": 158,
          "date": "2017"
        },
        "other_renewable_sources": {
          "percent": 0,
          "global_rank": 176,
          "date": "2017"
        }
      }
    },
    "crude_oil": {
      "production": {
        "bbl_per_day": 0,
        "global_rank": 110,
        "date": "2017"
      },
      "exports": {
        "bbl_per_day": 0,
        "global_rank": 94,
        "date": "2015"
      },
      "imports": {
        "bbl_per_day": 0,
        "global_rank": 97,
        "date": "2015"
      },
      "proved_reserves": {
        "bbl": 0,
        "global_rank": 107,
        "date": "2018-01-01"
      }
    },
    "refined_petroleum_products": {
      "production": {
        "bbl_per_day": 0,
        "global_rank": 119,
        "date": "2017"
      },
      "consumption": {
        "bbl_per_day": 5000,
        "global_rank": 178,
        "date": "2016"
      },
      "exports": {
        "bbl_per_day": 0,
        "global_rank": 131,
        "date": "2015"
      },
      "imports": {
        "bbl_per_day": 3939,
        "global_rank": 178,
        "date": "2015"
      }
    },
    "natural_gas": {
      "production": {
        "cubic_metres": 0,
        "global_rank": 105,
        "date": "2017"
      },
      "consumption": {
        "cubic_metres": 0,
        "global_rank": 122,
        "date": "2017"
      },
      "exports": {
        "cubic_metres": 0,
        "global_rank": 70,
        "date": "2017"
      },
      "imports": {
        "cubic_metres": 0,
        "global_rank": 92,
        "date": "2017"
      },
      "proved_reserves": {
        "cubic_metres": 0,
        "global_rank": 111,
        "date": "2014-01-01"
      }
    },
    "carbon_dioxide_emissions_from_consumption_of_energy": {
      "megatonnes": 793700,
      "global_rank": 174,
      "date": "2017"
    }
  },
  "communications": {
    "telephones": {
      "fixed_lines": {
        "total_subscriptions": 21883,
        "subscriptions_per_one_hundred_inhabitants": 31,
        "global_rank": 173,
        "date": "2017"
      },
      "mobile_cellular": {
        "total_subscriptions": 64997,
        "subscriptions_per_one_hundred_inhabitants": 92,
        "global_rank": 198,
        "date": "2017"
      },
      "system": {
        "general_assessment": "a good, fully automatic digital telephone system with fiber-optic trunk lines; telecom sector provides a relatively high contribution to overall GDP; numerous competitors licensed, but small and localized (2017)",
        "domestic": "the system has a high fixed-line teledensity 31 per 100, coupled with a mobile-cellular teledensity of roughly 92 per 100 persons (2017)",
        "international": "country code - 1-441; landing points for the GlobeNet, Gemini Bermuda, CBUS, and the Challenger Bermuda-1 (CB-1) submarine cables; satellite earth stations - 3 (2015)"
      }
    },
    "broadcast_media": "3 TV stations; cable and satellite TV subscription services are available; roughly 13 radio stations operating (2012)",
    "internet": {
      "country_code": ".bm",
      "users": {
        "total": 69126,
        "percent_of_population": 98,
        "global_rank": 181,
        "date": "2016-07-01"
      }
    }
  },
  "transportation": {
    "air_transport": {
      "civil_aircraft_registration_country_code_prefix": {
        "prefix": "VP-B",
        "date": "2016"
      },
      "airports": {
        "total": {
          "airports": 1,
          "global_rank": 214,
          "date": "2013"
        },
        "paved": {
          "total": 1,
          "2438_to_3047_metres": 1,
          "date": "2017"
        }
      }
    },
    "roadways": {
      "total": {
        "value": 447,
        "units": "km"
      },
      "paved": {
        "value": 447,
        "units": "km"
      },
      "note": "225 km public roads; 222 km private roads",
      "global_rank": 138,
      "date": "2010"
    },
    "merchant_marine": {
      "total": 160,
      "by_type": [
        {
          "type": "bulk carrier",
          "count": 10
        },
        {
          "type": "container ship",
          "count": 8
        },
        {
          "type": "general cargo",
          "count": 1
        },
        {
          "type": "oil tanker",
          "count": 18
        },
        {
          "type": "other",
          "count": 123
        }
      ],
      "global_rank": 72,
      "date": "2017"
    },
    "ports_and_terminals": {
      "major_seaports": [
        "Hamilton",
        "Ireland Island",
        "Saint George"
      ]
    }
  },
  "military_and_security": {
    "branches": {
      "by_name": [
        "Bermuda Regiment"
      ],
      "date": "2012"
    },
    "service_age_and_obligation": {
      "years_of_age": 18,
      "note": "18-45 years of age for voluntary male or female enlistment in the Bermuda Regiment; males must register at age 18 and may be subject to conscription; term of service is 38 months for volunteers or conscripts",
      "date": "2012"
    },
    "note": "defense is the responsibility of the UK"
  },
  "transnational_issues": {
    "disputes": [
      "none"
    ]
  }
}
JSON Country Record

We can see the data is very detailed, and also a strong fit for DynamoDB: our "records" are JSON documents with many levels of data. We've also added 3 fields of our own: key, timestamp, and source. Key is a derivative of the country name suitable for using as a filename or general key; it's the name converted to lower case, with some characters removed (commas, parentheses), and some characters replaced with underscore (spaces, hyphens). Thus the key for "United States" is "united_states". Timestamp is when the data was last collected. Source is just "Factbook"; we add it because DynamoDB expects a field of the document to map to a partition key.

Loading Files into S3

S3 will hold, for each country, the country JSON record as well as image files for flag and map. We don't really need the country JSON in S3 for this project (since we're going to query DynamoDB for country data), but we're going to be importing the JSON from S3 as a staging location when we insert the data into DynamoDB. 

I've already retrieved the JSON data and split it into 260 separate country JSON records previously, as well as the flag and map image files. All were originally stored in Azure blob storage. You can get a blow-by-blow account of that here

To copy over the country JSON and image files, I first downloaded the Azure blobs using my Azure Storage Explorer tool; and then uploaded them to S3 by dragging them into the AWS S3 console.  Here's what our end-result in S3 looks like:


Country files in S3

We now have a JSON document for each country, as well as a flag and map image for each country:

armenia.gif

armenia-map.gif

Loading DynamoDB

Next, we want to get our country data into DynamoDB, one country document per country. To do that, we create a DynamoDB table in the AWS Console named factbook. DynamoDB requires us to think about partition key and sort key, which collectively form our unique key to a record. Although our country document records are very deep, the actual number of records is small: 260. Accordingly, we will use the same partition key ("Factbook") for all of our records. The source field we added to the JSON contains this value, so our partition key field is source. For sort key, we'll use country name, captured in the name field.


Creating DynamoDB Table

In my original project, I wrote a durable function which ran on a timer once a week, processing 260 country records in parallel. We may do the same for AWS at some point, but today we'll be more modest: we'll develop a Lambda Function to create a country record in DynamoDB. The function will be called via HTTP with a key parameter, which will be a country key such as "afghanistan" or "united_kingdom". The function will read the country's .json file that is in S3 and insert it into DynamoDB. We'll have to invoke the function for each country.

Lambda Function to Load DynamoDB Country Record

Our load-country function, written in Node.js, first retrieves the JSON file from our S3 bucket (lines 28-44); the function has a role assigned whose policy grants access to our factbook-data S3 bucket as well our Factbook dynamoDB table. We next replace empty strings with nulls because DynamoDB does not allow empty strings. Next we parse it into an actual JSON variable so we can work with it (line 65). The code adds three housekeeping properties to the original json: key (country key), timestamp, and source ("Factbook") at lines 78-80.
// load-country : load a country record
//
// This function retrieves a JSON country record for the specified key from S3, 
// and inserts a document into the factbook DynamoDB table.

// inputs:
//     key parameter: country key, such as "united_states"
//     https://s3.amazonaws.com/factbook-data/*.json must exist

const http = require('http');

exports.handler = function(event, context, callback) {

    const AWS = require('aws-sdk');
    AWS.config.update({region: 'us-east-1'});
    const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 

    // Get country key from HTTP query parameter.

    var key = event["queryStringParameters"]['key'];
    //var key = "antigua_and_barbuda";  // <= for in-portal testing

    // Retrieve .json from s3

    var url = 'http://s3.amazonaws.com/factbook-data/' + key + '.json';
    console.log("01 http.get " + url);

    return http.get(url, function(response) {
        // Continuously update stream with data
        var body = '';
        response.on('data', function(d) {
            body += d;
        });
        response.on('end', function() {

            console.log('02 on end');

            // Data reception is done, do whatever with it!
            
            // replace empty strings ("") with null because DynamoDB disallowes empty strings
            
            body = replace(body, 'type: ""', 'type: null');
            body = replace(body, '"type": ""', '"type": null');
            body = replace(body, 'name_alternative: ""', 'name_alternative": null');
            body = replace(body, '"name_alternative": ""', '"name_alternative": null');
            body = replace(body, 'note: ""', 'note: null');
            body = replace(body, '"note": ""', '"note": null');
            body = replace(body, 'foreign_based: ""', 'foreign_based: null');
            body = replace(body, '"foreign_based": ""', '"foreign_based": null');

            // parse body text into a JSON object
            
            var data = null;
            try {
                data = JSON.parse(body);
                console.log('03 parsed');
                console.log(data.name);
            }
            catch(e) {
                console.log('03-A exception in JSON.parse: ' + e.toString());
                console.log(body);
            }
            
            if (data != null) 
            {
                // add 3 fields to the document
                
                data.key = key; // countryKey(data.name);
                data.timestamp = 'Monday, February 11, 2019 4:09:28 PM';
                data.source = 'Factbook';
                
                // insert country record
                
                var params = {
                    TableName: 'factbook',
                    Item: data
                    };
    
                console.log("Adding new item...");
                docClient.put(params, function(err, data2) {
                    if (err) {
                        console.error("04 error inserting document. Error JSON:", JSON.stringify(err, null, 2));
                        context.done(err, {
                        'statusCode': 200,
                        'headers': { 'Content-Type': 'application/json' },
                        'body': 'Failed to add record'
                        });
                    } else {
                        console.log("05 document inserted - source | name: " + data.source + ' | ' + data.name);
                        context.done(null, {
                             'statusCode': 200,
                            'headers': { 'Content-Type': 'application/json' },
                            'body': 'Added record ' + data.name //JSON.stringify(data)
                        });
                    }
                });
            }
            else {
                 context.done(null, {
                             'statusCode': 200,
                            'headers': { 'Content-Type': 'application/json' },
                            'body': 'Failed to add record due to JSON parse error ' //JSON.stringify(data)
                        });
            }
        });
    }).on('error', function(err) {
        // handle errors with the request itself
        console.error('04 Error with the request:', err.message);
        callback(err);
    });


};

// ---- countryKey : generate a country key from a country name

function countryKey(countryName) {
    var countryKey = countryName.toLowerCase();
    countryKey = replace(countryKey, ' ', '_');
    countryKey = replace(countryKey, '-', '_');
    countryKey = replace(countryKey, '(', '');
    countryKey = replace(countryKey, ')', '');
    countryKey = replace(countryKey, ',', '');
    countryKey = replace(countryKey, "'", '');
    return countryKey;
}

function replace(value, oldChar, newChar) {
    if (!value) return null;
    return value.split(oldChar).join(newChar);
}
load-country Lambda Function

Now we can insert our DocumentDB record. We created the necessary DocumentClient in lines 24-27. Now in lines 90-106, we create a params object containing the table name and document data; and store it with a docClient.push. If no errors occurred, our record is added and DynamoDB now has the country document.

When we test our function, it says all is well.

Invoking load-country

..and, we can verify that by viewing the new record added to DynamoDB in the AWS console:

Viewing added county document in DynamoDB

Lambda Functions to Access Country Data

Now that we have the World Factbook data in a DynamoDB table,  we can write Lambda functions to query it. 

country

The first function we want to write is named country, and its purpose is simply to return an entire country document given a country name. We're writing in Node.js and developing right in the AWS console. Our function is triggered via API Gateway, so that it can instantiated with an HTTP request. We bump the memory to 512MB (the default size is too small for working with DynamoDB).


country function in AWS console

Let's review the code below to understand how it works. We declare a DocumentClient (lines 3-5), which is how we'll access DynamoDB. In line 14, we extract the expected country name in a URL query parameter called name; if for example you want the country record for Japan, you'll add ?name=Japan to the end of the URL. To retrieve the country record, we know that our partition is always "Factbook" and our sort key is the country name. To query the data, we issue a docClient.query (lines 33-46). If successful, the data is returned in the response.
exports.handler = function(event, context, callback) {

    const AWS = require('aws-sdk');
    AWS.config.update({region: 'us-east-1'});
    const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 
    
    var corsHeaders = {
                            "Access-Control-Allow-Origin" : "*",
                            "Access-Control-Allow-Credentials" : true
                    };

    var countryName = null;
    
    if (event && event.queryStringParameters && event.queryStringParameters.name) countryName = event.queryStringParameters.name;
    
    if (!countryName) {
        callback(null, { statusCode: 200, headers: corsHeaders, body: 'Missing parameter: name' });
    }

    var params = {
      TableName: 'factbook',
      ExpressionAttributeNames: {
         '#name': 'name',
         '#source': 'source'
      },
      ExpressionAttributeValues: {
        ':name': countryName,
        ':source': 'Factbook'
      },
      KeyConditionExpression: '#name = :name and #source = :source',
    };
    
    docClient.query(params, function(err, data) {

    if (err) { 
        console.log('03 err:')
        console.log(err.toString());
        callback(err, { statusCode: 500, headers: corsHeaders, body: 'Error: ${err}' });
    } else { 
        if (!data || data.Items.length===0) {
            callback(null, { statusCode: 400, headers: corsHeaders, body: 'Country not found: ' + countryName });
        }
        else {
            callback(null, {
                    headers: corsHeaders,
                    body: JSON.stringify(data.Items[0])
                });
        }
    }
  });
};
country function source code (Node.js)

The parameters that are set up for the query (lines 20-31) deserve some explanation. The KeyConditionExpression is our query. We're merely interested in a source (partition key) of "Factbook" and a name (sort key) equal to our country name parameter. name We would normally specify a KeyConditionExpression value this...

name = :name and source = :source

...except that name and source are both DynamoDB reserved words. To get around that, we use #name and #source, and define those in the ExpressionAttributeNames parameter (lines 22-25). Our query then ends up being this:

KeyConditionExpression: '#name = :name and #source = :source'

If you haven't worked with DynamoDB before, the :name and :source may be unfamiliar. These are parameters that get replaced by values in the ExpressionAttributeValues parameter (lines 26-29).

If the query is successful, we return the entire result. Here's what it's like to invoke country from a browser (note: I have the JSONView Chrome Extension installed which nicely formats the JSON):

Invoking country function from a browser

people

The country function is great, but it's a big blast of data. Perhaps we're interested in a smaller part of the whole. The country JSON has subsections named introduction, geography, people, government, economy, and so on. Let's create a people function to return just the people section.

The only area of people that's different from country is the query parameters: we've added a ProjectionExpression that limits the results to the people section of the document. 
    var params = {
      TableName: 'factbook',
      ExpressionAttributeNames: {
         '#name': 'name',
         '#source': 'source'
      },
      ExpressionAttributeValues: {
        ':name': countryName,
        ':source': 'Factbook'
      },
      KeyConditionExpression: '#name = :name and #source = :source',
      ProjectionExpression: 'people'
    };
    
    docClient.query(params, function(err, data) {

    if(err) { 
        console.log('03 err:')
        console.log(err.toString());
        callback(null, { statusCode: 400, body: 'Country not found: ' + countryName });
    } else { 
        if (!data || data.Items.length===0) {
            callback(err, { body: null });
        }
        else {
            callback(null, { body: JSON.stringify(data.Items[0].people) });
        }
    }
  });
Code in people that's different from country

Here's the result of running people in a browser. Now we're dealing with a much smaller section of the country JSON.


Invoking people function from a browser

We can similarly create sister functions named introduction, geography, economy, communications. etc. In each case, the only change needed would be the ProjectExpression.

population


Let's consider one other example. What if we only need to retrieve a single field from the JSON document, such as population? population lives under people.population.total in the country JSON. Here we can again modify the ProjectExpression, but this time we'll use dotted notation to indicate a path through the document. Once again though we have to deal with the fact that total is a DynamoDB reserved word. We can resolve that with another #attributename shortcut. Here''s what our parameter code ends up looking like:
var params = {
  TableName: 'factbook',
  ExpressionAttributeNames: {
     '#name': 'name',
     '#source': 'source',
     '#tot': 'total'
  },
  ExpressionAttributeValues: {
    ':name': countryName,
    ':source': 'Factbook'
  },
  KeyConditionExpression: '#name = :name and #source = :source',
  ProjectionExpression: 'people.population.#tot'
};
people function parameter code

The above will return just the population value, but it will be wrapped as follows:

{
  "body": "{\"people\":{\"population\":{\"total\":329256465}}}"
}

To shorten the result to just be the value, we can change our callback as follows to bypass the containing people and population objects.

callback(null, { body: JSON.stringify(data.Items[0].people.population) });

Now the result is:

{
  "total": 329256465
}

Any time we want to return just a scalar value, we can use this technique of a dotted document path in a ProjectionExpression.



In Conclusion

Today in Part 1 we brought public-domain CIA World Factbook data into AWS, storing country records in DynamoDB and image/JSON files in S3 storage. We used a Lambda function to read JSON files from S3 and inject them as documents into our DynamoDB table. Working with DynamoDB from JavaScript was fast and easy. We did have to learn how to work around a few caveats, including empty strings not permitted in the document data and how to deal with reserved words in queries.

We then created Lambda functions to get at the data. We saw that we could return an entire large country JSON, or a subsection of it, or just a discrete individual property. Once we had functions at each of these levels of data, creating derivates for other sections or properties was trivial. Developing Lambda functions, editing and testing right in the AWS console, was also a quick and painless experience. We did have to be careful to adhere to proper JavaScript coding patterns for asynchronous methods such as the use of promises.

We now have our data in place and a means to access it. Now that we've laid this groundword, we'll go on in Parts 2 and 3 to create web and voice interfaces so users can work with the data. Stay Tuned!

Next: Part 2: Front-end API & Web Site using Lambda Functions and DynamoDB