Friday, February 22, 2019

CIA World Factbook Data on AWS, Part 2: Front-end API & Web Site using Lambda Functions and DynamoDB

In this 3-part series, I'm going to show how you can take CIA World Factbook data and use it for your own purposes on Amazon Web Services.


Previously in Part 1 we noted the CIA World Factbook data is public domain. We created the back-end to collect data and store it in DynamoDB and S3 storage, using a Lambda Function to insert document records. We also created some rudimentary Lambda Functions for accessing country records.

Today in Part 2 we will create the front-end, which will include a fuller API and a web site—powered by Lambda Functions and DynamoDB. With that, we'll finally be able to access and use all that data we collected. You can access the web site at http://world-factbook.aws.davidpallmann.com.

  

What We're Building

Today we'll be creating three things to form our front-end:
  1. API / Lambda Functions. We're going to use an API of Lambda Functions to query the DynamoDB. We'll need functions to look up country data, perform searches, and retrieve chart data.
  2. DynamoDB Secondary Indexes. We'll need to add additional indices to our DynamoDB database in order to support chart data retrieval.
  3. Web Site. We'll create a web site that allows browsing, searching, and viewing charts of world country data via the API. This site will work on both desktop and mobile devices.
Our web site can do 3 things, and we'll address each one in a section of this post:
  • Country View: select a country to view its record details (geography, people, economy, etc.)
  • Search: enter a search term and get a list of matching countries.
  • Charts: select a chart and view a column chart.

Web Site Foundation

In a prior series, I hosted this data on Microsoft Azure and created a statically-hosted web site. We're going to host the same web site for the AWS edition, but I've refactored the web site code so that it more easily supports either cloud platform with common source code. 

I've also decided to change the background map image (also public domain) and color theme for the AWS site. The original Azure site had an Amber-Sienna color theme; the AWS site will have a blue color schema. Here's how the two web sites compare:

Azure Edition of Web Site

AWS Edition of Web Site

Here's how I've structured the site, as a small number of files that can be hosted in inexpensive cloud storage. The majority of the code, markup, and style rules are common; only four small files are deployment-specific: favicon.ico, logo.png, theme.css, and cloud.js. 

File  Description  Common or Unique
cloud.js  Platform-specific functions and properties  Unique
favicon.ico  Browser icon  Unique
index.html  Web page  Common
logo.png  Logo  Unique
site.css  Style rules  Common
site.js  JavaScript  Common
theme.css  Color theme  Unique
world-map.jpg  Background image  Unique

Since I've previously blogged about the site, I'm not going to do a detailed walk-through of the code. However, I will highlight some critical parts of it. The full code can be accessed on github (see link at end of post).

When you access the web site, you may experience a short delay initially; that's because Lambda Functions power the site, and if inactive (a cold start) you'll experience a few seconds wait while the function is deployed. There are ways to keep the functions warm, but as this is just a demonstration I opted for a lowest cost deployment.

Country View

The user can select a country from the drop-down list of 260 countries at top right.


When a country is selected, the Lamba Function country is called which returns a complete JSON country record. We wrote this function  and studied the country JSON in Part 1; here's an updated view of the code.
exports.handler = function(event, context, callback) {

    const AWS = require('aws-sdk');
    AWS.config.update({region: 'us-east-1'});
    const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 
    
    var corsHeaders = {
                            "Access-Control-Allow-Origin" : "*",
                            "Access-Control-Allow-Credentials" : true
                    };

    var countryName = null;
    
    if (event && event.queryStringParameters && event.queryStringParameters.name) countryName = event.queryStringParameters.name;
    
    if (!countryName) {
        callback(null, { statusCode: 200, headers: corsHeaders, body: 'Missing parameter: name' });
    }

    var params = {
      TableName: 'factbook',
      ExpressionAttributeNames: {
         '#name': 'name',
         '#source': 'source'
      },
      ExpressionAttributeValues: {
        ':name': countryName,
        ':source': 'Factbook'
      },
      KeyConditionExpression: '#name = :name and #source = :source',
    };
    
    docClient.query(params, function(err, data) {

    if(err) { 
        console.log('03 err:')
        console.log(err.toString());
        callback(err, { statusCode: 500, headers: corsHeaders, body: 'Error: ${err}' });
    } else { 
        if (!data || data.Items.length===0) {
            callback(null, { statusCode: 400, headers: corsHeaders, body: 'Country not found: ' + countryName });
        }
        else {
            callback(null, {
                    headers: corsHeaders,
                    body: JSON.stringify(data.Items[0])
                });
        }
    }
  });
};
country Lambda Function

To be able to call this function from our web site without Cross-Original Resource Sharing (CORS) errors, we also had to go to API Gateway configuration for our country-API and Enable CORS.

Enabling CORS in API Gateway

After retrieving the country JSON from the country function, the site then populates the accordion content sections (Introduction, Geography, People, Government, Economy, Energy, Communications, Transportation, Military and Security,and Transnational Issues). Although we're getting and displaying many fields, it's only a fraction of what's in the data; over time, we'll try to expand it.

Although the country JSON document structure is consistent, any particular element we want to access may or may not be present. Accordingly, our JavaScript code to create content sections has to carefully check whether elements exist. Below is a code fragement showing how the Person section of content is assembled. We did not use a framework like Angular or React to do this (although we may at some point); even so we would have had to use the same logic and checks in our HTML template.
// Load content: People

var people = '';
if (data.people) {
    if (data.people.population && data.people.population.total) {
        people += '<div class="item"><b>Population</b><br/>' + numberWithCommas(data.people.population.total) + '</div>';
    }
    if (data.people.population && data.people.population.rank) {
        people += '<div class="item"><b>Global Rank</b><br/>' + data.people.population.global_rank + '</div>';
    }
    if (data.people.nationality && data.people.nationality.adjective) {
        people += '<div class="item"><b>Nationality</b><br/>' + data.people.nationality.adjective + '</div>';
    }
    if (data.people.ethnic_groups && data.people.ethnic_groups.ethnicity) {
        var ethnic_groups = data.people.ethnic_groups;
        people += '<div class="item"><b>Ethnic Groups</b><br/>'
        for (var i = 0; i < ethnic_groups.ethnicity.length; i++) {
            var pct = ethnic_groups.ethnicity[i].percent;
            var elem = ethnic_groups.ethnicity[i].name;
            var note = ethnic_groups.ethnicity[i].note;
            if (pct)
                elem += " (" + pct + '%)';
            if (note)
                elem += " note: " + note;
            if (i == 0)
                people += elem;
            else
                people += ", " + elem;
        }
        people += '</div>';
    }
    if (data.people.languages && data.people.languages.language) {
        var languages = data.people.languages;
        people += '<div class="item"><b>Languages</b><br/>'
        for (var i = 0; i < languages.language.length; i++) {
            var pct = languages.language[i].percent;
            var elem = languages.language[i].name;
            if (pct)
                elem += " (" + pct + '%)';
            if (i == 0)
                people += elem;
            else
                people += ", " + elem;
        }
        people += '</div>';
    }
    if (data.people.religions && data.people.religions.religion) {
        var religions = data.people.religions;
        people += '<div class="item"><b>Religions</b><br/>'
        for (var i = 0; i < religions.religion.length; i++) {
            var pct = religions.religion[i].percent;
            var elem = religions.religion[i].name;
            if (pct)
                elem += " (" + pct + '%)';
            if (i == 0)
                people += elem;
            else
                people += ", " + elem;
        }
        people += '</div>';
    }
    if (data.people.life_expectancy_at_birth && data.people.life_expectancy_at_birth.total_population && data.people.life_expectancy_at_birth.total_population.value && data.people.life_expectancy_at_birth.total_population.units) {
        people += '<div class="item"><b>Life Expectancy at Birth</b><br/>' + data.people.life_expectancy_at_birth.total_population.value + ' ' + data.people.life_expectancy_at_birth.total_population.units + '</div>';
    }
    if (data.people.population_growth_rate && data.people.population_growth_rate.growth_rate && data.people.population_growth_rate.date) {
        people += '<div class="item"><b>Population Growth Rate</b><br/>' + data.people.population_growth_rate.growth_rate + ' (' + data.people.population_growth_rate.date + ')</div>';
    }
    if (data.people.birth_rate && data.people.birth_rate.births_per_1000_population && data.people.birth_rate.date) {
        people += '<div class="item"><b>Birth Rate</b><br/>' + data.people.birth_rate.births_per_1000_population + ' births per thousand (' + data.people.birth_rate.date + ')</div>';
    }
    if (data.people.death_rate && data.people.death_rate.deaths_per_1000_population && data.people.death_rate.date) {
        people += '<div class="item"><b>Death Rate</b><br/>' + data.people.death_rate.deaths_per_1000_population + ' deaths per thousand (' + data.people.death_rate.date + ')</div>';
    }
    if (data.people.demographic_profile) {
        people += '<div class="item"><b>Demographic Profile</b><br/>' + data.people.demographic_profile + '</div>';
    }
}
$('#content-people').html(people);
JavaScript to Extract People Content

If no accordion sections were already open when the content loads, the Introduction section expands. The user can then review the data. In addition to the JSON country record, there are image files for the country flag and a map; these are retrieved from S3 storage.


Country View in Web Site

Search

Searching the data is tough: we don't have a full-text search capability at present. Fortunately, a text search would likely target either the country name or one of the large text briefs in the data, such as introduction.background or government.overview. Accordingly, our first implementation of search will use a Lambda Function that looks for a match in select fields of the country record.

Below is our search Lambda Function, which accepts a term and returns an array of matching country names and keys.
exports.handler = function(event, context, callback) {

    const AWS = require('aws-sdk');
    AWS.config.update({region: 'us-east-1'});
    const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 
    
    var corsHeaders = {
                        "Access-Control-Allow-Origin" : "*", // Required for CORS support to work
                        "Access-Control-Allow-Credentials" : true // Required for cookies, authorization headers with HTTPS 
                    };

    var testing = false;

    var term = null;

    if (testing) {
        term = 'island';
    }
    else {
        try {
            term = event.queryStringParameters.term;
        }
        catch(e) { }
    }
    
    if (term===null || term===undefined || term==='') {
        callback(null, { statusCode: 400, headers: corsHeaders, body: 'Missing parameter: term' });
        return;
    }

    var params = {
      TableName: 'factbook',
      ExpressionAttributeNames: {
         '#key': 'key',
         '#name': 'name',
         '#source': 'source'
      },
      ExpressionAttributeValues: {
        ':term': term,
        ':source': 'Factbook'
      },
      KeyConditionExpression: '#source = :source',
      FilterExpression: 'contains(#key, :term) or contains(introduction.background, :term) or contains(geography.climate, :term) or contains(geography.terrain, :term) or contains(people.demographic_profile, :term) or contains(economy.overview, :term) or contains(geography.map_reference, :term) or contains(government.government_type, :term) or contains(transnational_issues.disputes[0], :term)',
      ProjectionExpression: '#name, #key'
    };

    docClient.query(params, function(err, data) {

    if(err) { 
        console.log('03 err:')
        console.log(err.toString());
        callback(err, { statusCode: 500, headers: corsHeaders, body: 'Error: ${err}' });
    } else { 
        callback(null, {
                    headers: corsHeaders,
                    body: JSON.stringify(data.Items)
                });
    }
  });
};
search Lambda Function

Our function will be HTTP-triggered via AWS API Gateway and we'll write it in Node.js. The function code initializes a DynamoDB document client (lines 3-5), extracts the expected term query string parameter (lines 14-29), sets up query parameters (lines 31-45), and executes the query (lines 47-59).

The query parameter of interest is FilterExpression, which is a long series of contains(field, :term) sequences connected with OR operators. The DynamoDB contains operator is case-sensitive, putting a burden on the user. This "poor man's search" will be adequate for the time being, but we'll want to come back and improve on this at a later time. Ideally, a user should be able to search the entire country record with a full-text, case-insensitive search.

Here's an example of what search returns when given the term "island":
[
{
key: "american_samoa",
name: "American Samoa"
},
{
key: "anguilla",
name: "Anguilla"
},
{
key: "antarctica",
name: "Antarctica"
},
{
key: "antigua_and_barbuda",
name: "Antigua And Barbuda"
},
{
key: "aruba",
name: "Aruba"
},
{
key: "ashmore_and_cartier_islands",
name: "Ashmore And Cartier Islands"
},
{
key: "bahamas_the",
name: "Bahamas, The"
},
{
key: "barbados",
name: "Barbados"
},
{
key: "bermuda",
name: "Bermuda"
},
{
key: "bouvet_island",
name: "Bouvet Island"
},
{
key: "british_indian_ocean_territory",
name: "British Indian Ocean Territory"
},
{
key: "british_virgin_islands",
name: "British Virgin Islands"
}
]
search Function Results

Our web site's JavaScript code for search is below. An Ajax call is made to the Lambda Function, and the results are iterated to come up with a search results list of country names and flags.
// Perform a search.

function search() {

    var term = $('#search-text').val();
    if (!term) return;

    $("body").css("cursor", "progress");
    $('#loading').css('visibility', 'visible');

    $('#country').val('');
    $('#chart-select').val('');

    $('h2').removeClass('optional');

    $('#country-view').css('visibility', 'collapse');
    inCountryView = false;

    var url = cloud.searchUrl(term);

    $.ajax({
        type: 'GET',
        url: url,
        accepts: "json",
    }).done(function (response) {

        var results = cloud.resultToJson(response);

        var html = '<table id="results-table" style="color: white; font-size: 20px">';
        var count = 0;
        var countryKey = null;
        if (results) {
            for (var i = 0; i < results.length; i++) {
                countryKey = CountryKey(results[i].name);
                html += '<tr style="cursor: pointer; height: 24px; border-bottom: solid 1px white" onclick="selectCountry(' + "'" + results[i].name + "'" + ');">';
                if (haveFlag(results[i].name)) {
                    var flagImageUrl = cloud.flagImageUrl(countryKey);
                    html += '<td style="text-align: right"><img class="content-image-thumbnail" src="' + flagImageUrl + '"></td>';
                }
                else {
                    html += '<td> </td>';
                }
                html += '<td>  </td><td style="vertical-align: middle">' + results[i].name + '</td></tr>';
                count++;
            }
        }
        if (count == 0) {
            html += '<tr><td>No matches</td></tr>';
        }
        html += '</table>';

        $('#results-list').html(html);
        $('#country-flag').css('visibility', 'collapse');
        $('#chart-view').css('visibility', 'collapse');
        $('#results-view').css('visibility', 'visible');

        $('#loading').css('visibility', 'collapse');
        $("body").css("cursor", "default");
    });
}
JavaScript search code

Putting it all together, a search on the web site looks like this. With the search results displayed, the user may click on any country in the list; if they do, a Country View takes place just as if they had selected the country from the top right drop-down.

Search in Web Site

Charts

Lastly, our web site offers chart views. The user can select a chart from the list and see a chart showing comparative country data. The site currently provides these charts:
  • Area - Largest
  • Area - Smallest
  • Exports - Highest
  • Exports - Lowest
  • Imports - Highest
  • Imports - Lowest
  • Inflation - Highest
  • Inflation - Lowest
  • Internet Users - Most
  • Population - Highest
  • Population - Lowest
Each chart provides a list of 10 countries rendered as a column chart using Google Charts.

Although the number of documents we have in DynamoDB is small (260), we nevertheless want to follow good practices that would also work well with data at large scale. In the country JSON, there are properties deep in the document that list a country's global rank for area, exports, Internet users, etc. All we need to do, then, is sort by the particular global rank we're interested in and take the top 10 results.

Adding a Secondary Index to DynamoDB

In DynamoDB, you can't specify an order in your query other than to use the sort order of an index (ascending or descending). So, if we wanted to list countries by order of area global rank, we'd want to order by the area global rank and plot the area in square km in our chart. Here's where they are in the country JSON:


What we need to do, conceptually, is create a Secondary Index with a sort key of geography.area.global_rank that also includes the country name (name) and actual area (geography.area.total.value). Unfortunately, you can only include top-level properties in a Secondary Index so we can't do it exactly that way...

What we can do is promote these properties—and their brethren for the other charts—to be top-most properties. We do that by modifying the load-country Lambda Function we created in Part 1 to surface these new top-level properties. Here's the code we inserted:
if (data != null) 
{
// add 3 fields to the document

data.key = key; // countryKey(data.name);
data.timestamp = 'Monday, February 11, 2019 4:09:28 PM';
data.source = 'Factbook';

// promote fields to top that we need to index on

if (data.geography && data.geography.area && data.geography.area.global_rank) {
    data.global_rank_area = data.geography.area.global_rank; }
if (data.people && data.people.population &&  data.people.population.global_rank) {
    data.global_rank_population = data.people.population.global_rank; }
if (data.economy && data.economy.imports && data.economy.imports.total_value && data.economy.imports.total_value.global_rank)
    data.global_rank_imports = data.economy.imports.total_value.global_rank;
if (data.economy && data.economy.exports && data.economy.exports.total_value && data.economy.exports.total_value.global_rank)
    data.global_rank_exports = data.economy.exports.total_value.global_rank;
if (data.economy && data.economy.inflation_rate && data.economy.inflation_rate.global_rank)
    data.global_rank_inflation_rate = data.economy.inflation_rate.global_rank;
if (data.communications && data.communications.internet && data.communications.internet.users && data.communications.internet.users.global_rank)
    data.global_rank_internet_users = data.communications.internet.users.global_rank;
if (data.geography && data.geography.area && data.geography.area.total && data.geography.area.total.value)
    data.global_value_area = data.geography.area.total.value;
if (data.people && data.people.population &&  data.people.population.total)
    data.global_value_population = data.people.population.total;
if (data.economy && data.economy.imports && data.economy.imports.total_value && data.economy.imports.total_value.annual_values && data.economy.imports.total_value.annual_values[0] && data.economy.imports.total_value.annual_values[0].value)
    data.global_value_imports = data.economy.imports.total_value.annual_values[0].value;
if (data.economy && data.economy.exports && data.economy.exports.total_value && data.economy.exports.total_value.annual_values && data.economy.exports.total_value.annual_values[0] && data.economy.exports.total_value.annual_values[0].value)
    data.global_value_exports = data.economy.exports.total_value.annual_values[0].value;
if (data.economy && data.economy.inflation_rate && data.economy.inflation_rate.annual_values && data.economy.inflation_rate.annual_values[0] && data.economy.inflation_rate.annual_values[0].value)
    data.global_value_inflation_rate = data.economy.inflation_rate.annual_values[0].value;
if (data.communications && data.communications.internet && data.communications.internet.users && data.communications.internet.users.total)
    data.global_value_internet_users = data.communications.internet.users.total

// insert country record

var params = {
    TableName: 'factbook',
    Item: data
    };

console.log("Adding new item...");
docClient.put(params, function(err, data2) {
Code Added to load-country Function

With this update, and after re-loading all the country records into DocumentDB, we now have the top-level properties we need:

Updated Country JSON with Top-level Global Rank/Value Properties

Now we are able to create secondary indices on our DynamoDB database. Here's how we create the index rank-area-index. Let's take note of a few things. The partition key is source (which is always "Factbook"), same as our primary index. The sort key is global_rank_area, a number. This will make it easy to get the top N or bottom N countries by global area rank. The attributes for the index include name, global_area_rank, and global_area_value; we need the value in order to plot anything meaningful in our chart.

Creating Secondary Index

That was a bit of work; but with our index created, we can now create a Lambda Function to query by area rank. Here's report-area-highest, which returns the names, rank, and area for the top 10 countries with largest area. Notice that the query parameters specify an IndexName of rank-area-index and a ScanIndexForward (sort order) value of true. We also specify a Limit of 10, which will give us just 10 records back. For the sister report-area-lowest function, ScanIndexForward will be set to false.
// List top 10 countries with largest area

const AWS = require('aws-sdk');
AWS.config.update({region: 'us-east-1'});
var docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'}); 

var corsHeaders = { "Access-Control-Allow-Origin" : "*", "Access-Control-Allow-Credentials" : true };

exports.handler = function(event, context, callback) {

    var params = {
      TableName: 'factbook',
      IndexName: 'rank-area-index',
      ExpressionAttributeNames: {
         '#name': 'name',
         '#source': 'source'
      },
      ExpressionAttributeValues: {
        ':source': 'Factbook',
      },
      KeyConditionExpression: '#source = :source',
      ProjectionType : "ALL",
      ProjectionExpression: "#name, global_rank_area, global_value_area",
      Limit: 10,
      ScanIndexForward: true
    };
    
    docClient.query(params, function(err, data) {

        if(err) { 
            console.log('03 err:')
            console.log(err.toString());
            callback(err, { statusCode: 500, headers: corsHeaders, body: 'Error: ${err}' });
        } else { 
            callback(null, {
                    headers: corsHeaders,
                    body: JSON.stringify(data.Items)
                });
        }
      });
};
report-area-highest Lambda Function

Here's the output when report-area-highest is run. It's just what we want: the top 10 countries with highest area, including the name and value for each country.
[
{
    global_rank_area: 1,
    name: "Russia",
    global_value_area: 17098242
},
{
    global_rank_area: 2,
    name: "Antarctica",
    global_value_area: 14000000
},
{
    global_rank_area: 3,
    name: "Canada",
    global_value_area: 9984670
},
{
    global_rank_area: 4,
    name: "United States",
    global_value_area: 9833517
},
{
    global_rank_area: 5,
    name: "China",
    global_value_area: 9596960
},
{
    global_rank_area: 6,
    name: "Brazil",
    global_value_area: 8515770
},
{
    global_rank_area: 7,
    name: "Australia",
    global_value_area: 7741220
},
{
    global_rank_area: 8,
    name: "India",
    global_value_area: 3287263
},
{
    global_rank_area: 9,
    name: "Argentina",
    global_value_area: 2780400
},
{
    global_rank_area: 10,
    name: "Kazakhstan",
    global_value_area: 2724900
}
]
report-area-highest output

When the JavaScript code in the web site plots this with Google Charts, here's what the end result is:

Chart in Web Site

Each of the other charts was implemented exactly the same way: promote the appropriate properties to the top of the JSON, add a Secondary Index to DynamoDB,and write a simple Lambda Function to query using the index. The first one was a bit of work; but once the correct pattern was identified, all the others followed in rapid succession.

In Conclusion

In this series I showed how to retrieve world country data from CIA World Factbook and store it in AWS, along with an API and web site for accessing the data. DynamoDB was our primary repository along with S3 storage, and its performed well. Lambda Functions were integral to both the back end (loading country records) and front-end (country view, search, charts) and were written in Node.js (JavaScript).

The resulting web site can be accessed at http://world-factbook.aws.davidpallmann.com. To create the web site, a prior web site from another project cloud platform was refactored so that it would work with AWS or Azure with mostly common code.

To do charting, we needed to create secondary indices which in turn required us to promote some of our JSON values to top-level properties. Once that was done, it was a breeze to create the necessary Lambda Functions. Combined with Google Charts, we quickly had charts up and running.

What we've covered in Parts 1 and 2 took two days of development and two days of blog-writing.
Cloud-native services like Lambda and DynamoDB make for rapid development.

In Part 3, we'll be creating an Alexa Skill so that our data can be accessed by voice.

Web Site Source Code on GitHub

Next: CIA World Factbook on AWS, Part 3: Alexa Voice Interface using Lambda and DynamoDB


No comments: