Sunday, December 9, 2018

Ghost : A Father-Daughter Project, Part 1

Over the last Thanksgiving break, our oldest daughter Susan came home from college for a few days. She is studying web design and marketing, but has found herself more and more becoming a front-end UX designer. She's been spending a lot of time working on HTML/CSS/JavaScript projects with great success.

While we had a few days together, we decided it would be fun to collaborate on a project that would be useful for her coursework: a program that learns. She suggested implementing the game of GHOST that our family has always played, and that's what we did.

Rules of GHOST

In case you're not familiar with it, GHOST is a game well-suited for car trips, sitting around the dinner table, or any other time you have 2 to N people sitting around with nothing to do. It's a spelling game, and if you make a valid word of 3 letters or more, you've lost the round. The first time you lose a round, you're a "G". The next time you lose a round, you're a "GH". Once you get all the way to "GHOST", you're out of the game. Here's a sample sequence or two to give you the idea:

Player 1: K
Player 2: N
Player 3: O
Player 1: C
Player 2: K
Player 3: "That's a word. You're a G."

Player 1: P
Player 2: A
Player 1: C
Player 2: I
Player 1: N
Player 2: A
Player 1: "I challenge you. What was your word?"
Player 2: "Ah, you got me. I didn't want to go out on PACING, so I bluffed. I'm a G-H-O."

Although GHOST is a simple game, there are nuances and strategy to it. Programming a competent player is not as simple to implement programatically as you might think.

Our Game

Our version of Ghost will be a two-player edition, human against computer.

Since Susan has done front-end development but not back-end or database work, I volunteered to put together a starting point program, and write the back-end code to her specifications. Since I spend much of my time with ASP.NET, I created a sample ASP.NET Model-View-Controller (MVC) project, and added web.config and code to connect to a SQL Server database.

Since this game is supposed to learn, it most definitely needs a database so it can grow its word list as it plays. This is just the simplest of databases: one table named Words containing one colum, Word. We initially seeded the word list with a couple dozen entries, but ever since the list has been growing as a result of game play. At the time of this writing, it has over 1400 words. We could of course license a dictionary, but that would defeat the learning purpose of this exercise.


Our first objective was to implement basic game play in order to arrive at a functional game, although not a very smart one. The basic algorithm for game play is this:

Basic Gameplay Flowchart (click to enlarge)

When it's the human's turn, he or she has 3 options:

1. Play: press a letter key to continue the word.
2. Challenge: click a Challenge button.
3. That's a Word: click a That's a Word button.


The Challenge and That's a Word buttons aren't visible unless at least 3 letters have been played.

Initially the human player goes first. In each new round, the game will flip who the starting player is.

Flow for Human Plays a Letter

When the human presses a letter key, the letter is added to the current word in play. Non-letter keys are ignored.

Next, a check is made to see whether the human player has just completed a word: if they have, they have lost the round. This is done with a call to the back end to look up the current word-in-play against the Ghost word list.

SELECT word FROM words WHERE word=@round

If the word is found in the word list, the game informs the player and a loss is scored for them. The player gets the next letter in G-H-O-S-T, and if T has been reached the game is over.


If the word was not found in the word list, the computer needs to make a turn. A query is made of the word list for winning words. Winning words that begin with the current word-in-play and are also the right length (odd or even) such that the computer will win if the word is played out. For example, let's say the current word in play is B A C. That means a winning word must begin with BAC and also be an odd-number of characters. The search for winning words would inlude BACON but not BACK or BACCARAT. If one or more winning words are found, one is randomly selected and the next letter is played.

A good algorithm for selecting a winning word took some thought and experimentation. The example query below shows how a winning word is selected if the human player went first. The first WHERE clause ensures the word selected begins with the word-in-play. The second clause ensures the word selected is longer than the word-in-play. The third clause ensures the selected word, when played out, will result in the human player making a complete word and not the computer. The ORDER BY clause tells SQL Server to select a random order.

SELECT TOP 1 Word from Words 
WHERE word LIKE @round + '%' 
AND LEN(word)>LEN(@round) 
AND ((LEN(word) % 2)=0) 
ORDER BY NEWID()

The above query is actually augmented further, because we don't want to target a winning word only to discover we accidentally made a losing word on the way; for example, planning to play P to pursue the word ZIPPER would be a mistake because ZIP is a losing word. To achieve this, more WHERE clauses are added to the query to ensure the computer does not select any word that could result in a losing position.

If no winning words are found, then the computer must either challenge the user or make a bluff. We came up with this rule: if the word-in-play is 3 letters or more in length and there are no losing words in the word list, a challenge is made. The human player can then admit they were bluffing, or tell the computer their word. If the human player was bluffing, a loss is scored for them. If a word is provided, the game adds the word to its word list and scores a loss for itself.



If not challenging, then the computer must bluff. Initiallly a random letter was selected for bluffing in our game, but that was often too obvious a bluff in game play, with nonsense letter combinations. Susan came up with the idea of scanning the word list for the next letter to play. This results in more credible letter sequences. The bluff letter is played.

Flow for Human Clicks Challenge Button

The human player can click a Challenge button if they don't believe the computer is making a valid word.

The game looks for a word in its word list that begins with the current word-in-play. If found, the user is informed what the word is, and then someone loses the round. Usually this is the human player, except in the case where the computer's word has been fully played: in that case, the computer has made a complete word and loses the round.


If the computer was bluffing (no matching word in the word list), it admits it was bluffing and takes a loss.


Flow for Human Clicks That's a Word Button

The human player can click a That's a Word button to indicate the computer has made a complete word. Since the Ghost game is designed to learn as it plays, it trusts the human player to be truthful (and a good speller), and adds the word to its word list database. Now it knows the new word for future play.


Of course, trusting the human player comes with risks. That's the reason for the next section we'll discuss, Administration.

Administration

Our game has a page for viewing the word list. If you add administrator credentials, this page also allows adding and removing words. This is important because our game trusts the human player in learning new words. If there's a typographic error, or a disallowed word (like a proper name), or profanity, we want to be able to correct it or remove it.


The back end of administrative functions are simple DELETE and INSERT queries to the database.

Summary

Well, that's our game--so far. From a learning / intelligence perspective, Ghost can:

  • Learn new words
  • Distinguish between potential winning and losing words
  • Bluff convincingly

Susan is next going to re-do my placeholder front-end with her own UX design, which I'm sure will be highly creative. We'll cover that in a future Part 2.

I am greatly enjoying teaming up with my daughter on a project--something we haven't done since those middle school science project days that now seem so long ago.

Friday, December 7, 2018

Visualizing Workflow Activity with Sankey Diagrams

In this post, I'll demonstrate how something called a Sankey Diagram can be used with charting software to visually show workflow activity.



The Problem of Showing Workflow Activity Effectively

If you've ever worked with business workflows in software, you've likely struggled with the problem of communicating workflow activity to clients: it's important, but it's also difficult. This is especially true with complex workflows. While users are inherently familiar with their own workflow (or a portion of it relating to their position), graphically depiciting activity can be daunting.

It's not difficult to understand why this is difficult problem: just look at how workflows are organized and stored in computer systems. Although you might see workflows depicted with flowcharts or UML diagrams at times, their storage in digital systems tends to be a complex hierarchy of multiple levels, sometimes captured in the form of an XML dialect. Entities involved in the workflow have states and have to be gated through allowable future states depending on the rules of the workflow. Advancing from one state to another can happen in response to a user action; in response to a message from an external system; because a certain amount of time has passed; or can be automatic. On top of all that, some workflow paths may run in parallel.

Most often, you'll see bar/column/pie/donut charts used to show a breakdown of activity within one workflow stage. That's a fine way to show what's going on in a single stage, but it doesn't provide a view of activity across the workflow. That across-the-workflow view can be pretty important: are a significant portion of online orders being returned by customers? You wouldn't get any insight into such connections just looking at workflow activity one stage at a time.

Sankey Diagrams Illustrate Flow Well

It's flow of activity where Sankey diagrams become very helpful. Sankey diagrams are intended to show flow, and they do so with connecting lines or arrows whose size is proportional to the amount of activity.

You can see some simple and complex examples of Sankey Charts on the Google Charts gallery. While there, notice that Google's implementation lets you hover over a particular flow for more information, including count. But even without a visible count, you can tell relative activity by the thickness of the connection between source and destination. In the example below, we can see that the B-X connection has far less activity than the B-Y connection. If you imagine that the start and end states are stages or substages of your workflow, you begin to see the possibilities.

Simple Sankey Diagran

Here's a more complex example from the same Google gallery page that shows flow across a series of states. Even though there's a lot more going on this time, transitions from one state to another are clearly shown and the rate of activity is easy to gauge from the width of the connecting lines. This is what makes Sankey diagrams great for illustrating workflow.

Complex Sankey Diagran

Although the Google library is very good, I'm going to be using the Highcharts chart library for the remainder of this post, simply because that's what I use regularly. Google Charts requires Internet access and the license terms disallow including the library locally in your application; in contrast, Highcharts can be self-contained in the application but the library does need to be purchased. Both libraries render excellent results.

If you want to play around with the idea of Sankey diagrams without doing any coding, check out the SankeyMATIC page, which will let you set up a Sankey diagram without doing any coding. It's a great way to prototype a Sankey diagram before deciding to invest development effort.

In looking at a Sankey diagram, you might get the idea that they must be complex to program but this is not the case at all. Most chart libraries that support Sankey diagrams simply take arrays as data input, wher each array element specifies the name of a source state, the name of a destination state, and a count. The chart library takes it from there, stitching things together for you. Both Google Charts and Highcharts operate this way. We'll see an example of that shortly.

Sample Scenario: An Order Workflow

To show an example, we'll imagine the order workflow for a company that accepts both online orders and phone orders.

  • Orders have to be prepaid unless credit is approved. 
  • Once an order is prepaid or credit-approved, associates assemble the order by pulling SKUs from a warehouse. The order may also be gift-wrapped. 
  • Once assembled, orders are placed on a shipping dock awaiting pick up by the shipping carrier.
  • Orders that were not prepaid are billed and followed-up by accounts receivable until the order is paid in full.

Our workflow is implemented as a series of stage-substage-state combinations. Specific actions connect one state to another. For example, submitting an online order that requests credit transitions state from Shopping | Online Order | Order Submitted to Order Configuration | Credit Approval | Credit Review; whereas a prepaid order would transition to Order Configuration | Payment Verification | Applying Payment.

Order Workflow Stages, Substages, and States

Now imagine this system is up and running and users want to see order activity. We could of course do the usual simple charts to show what is happening at each stage of the workflow. That might look something like this:


Single-Stage Views

This is certainly useful information; it lets us look into the breakdown of shopping activity. But it tells us nothing about what's happening across the workflow. So, we're now going to create a Sankey Diagram in Highcharts to provide that added view.

Creating Sankey Diagrams for the Sample Scenario

We can first start out simply, by showing transition from the first stage (Shopping) to the second stage (Order Confirmation). To do that, we'll take a standard JSFiddle for a Highcharts Sankey diagram and simply modify the data array as follows. We're merely supplying array entries, with the source-state, destination-state, and count. Our array includes all of the Shopping stage start states and all of the Order Confirmation stage end states.We know the source and end states from our workflow, and we know the counts by querying our order database.



Below is the resulting Sankey diagram. Even though we're only focusing on the first two stages of the overall workflow, we can see the Sankey diagram yields a very rich view of what is going on. We can hover over the connecting lines for the exact count, but just at-a-glance we can see a great deal. We see that phone orders are miniscule compared to online orders. We see that that most of the orders are in various states of credit approval processing. We are now getting a sense of how activity is flowing, which tells more of a story than just looking at one stage at a time.

Sankey diagram: Shopping Stage to Order Confirmation Stage

Now, let's add the remaining stages. Our data list now looks like this, with more array elements.


And below, the Sankey chart showing activity flow across the entire workflow. Now we really are geting a sense for what's happening across the board (JSFiddle).


I'd like to point out a few useful things about the Highcharts implementation. First off, you can hover over any conection line to get a tooltip showing the end-state name and the count. With some coding, you could also arrange it so that clicking on a section drill down into a detail chart.

Highcharts Sankey Diagram - Detail on Hover

Another useful feature is the menu at top right, which permits the chart to be export as a graphic.


Our sample scenario is a modest workflow: what if your workflow is much more complex, to the point where the diagram is really crammed? Well, you certainly need to keep the informaton understandable and you should strive to avoid overwhelming the user. Here are some strategies to consider:

  • Set an appropriate number of colors--too many may reduce understandability. Consider whether it makes sense to color-code stages.
  • When showing the full workflow, leave out the most detailed state level and provide that elsewhere.
  • Show multi-stage sequences in sections rather than the entire workflow in a single view.
  • Allow users to click on a stage to get an expanded view of the detail in that stage.
  • Group states of little interest together into a single node or leave them out altogether.

If I were doing this for a client rather than a blog post, I would put extra time on finishing touches: adjusting colors, adjusting font and text effects, and considering whether some of the information is not of interest to its audience. Even without doing so, I hope this introduction to Sankey diagrams provides some insight into how workflow activity can be shown to users in an insightful way.

I only discovered Sankey diagrams recently, but their usefulness was immediately apparent. Not only are they useful, they're also very simple to create using leading chart libraries.If you're facing the challenge of visualizing workflow activity, I encourage you to try them out.