Friday, December 7, 2018

Visualizing Workflow Activity with Sankey Diagrams

In this post, I'll demonstrate how something called a Sankey Diagram can be used with charting software to visually show workflow activity.

The Problem of Showing Workflow Activity Effectively

If you've ever worked with business workflows in software, you've likely struggled with the problem of communicating workflow activity to clients: it's important, but it's also difficult. This is especially true with complex workflows. While users are inherently familiar with their own workflow (or a portion of it relating to their position), graphically depiciting activity can be daunting.

It's not difficult to understand why this is difficult problem: just look at how workflows are organized and stored in computer systems. Although you might see workflows depicted with flowcharts or UML diagrams at times, their storage in digital systems tends to be a complex hierarchy of multiple levels, sometimes captured in the form of an XML dialect. Entities involved in the workflow have states and have to be gated through allowable future states depending on the rules of the workflow. Advancing from one state to another can happen in response to a user action; in response to a message from an external system; because a certain amount of time has passed; or can be automatic. On top of all that, some workflow paths may run in parallel.

Most often, you'll see bar/column/pie/donut charts used to show a breakdown of activity within one workflow stage. That's a fine way to show what's going on in a single stage, but it doesn't provide a view of activity across the workflow. That across-the-workflow view can be pretty important: are a significant portion of online orders being returned by customers? You wouldn't get any insight into such connections just looking at workflow activity one stage at a time.

Sankey Diagrams Illustrate Flow Well

It's flow of activity where Sankey diagrams become very helpful. Sankey diagrams are intended to show flow, and they do so with connecting lines or arrows whose size is proportional to the amount of activity.

You can see some simple and complex examples of Sankey Charts on the Google Charts gallery. While there, notice that Google's implementation lets you hover over a particular flow for more information, including count. But even without a visible count, you can tell relative activity by the thickness of the connection between source and destination. In the example below, we can see that the B-X connection has far less activity than the B-Y connection. If you imagine that the start and end states are stages or substages of your workflow, you begin to see the possibilities.

Simple Sankey Diagran

Here's a more complex example from the same Google gallery page that shows flow across a series of states. Even though there's a lot more going on this time, transitions from one state to another are clearly shown and the rate of activity is easy to gauge from the width of the connecting lines. This is what makes Sankey diagrams great for illustrating workflow.

Complex Sankey Diagran

Although the Google library is very good, I'm going to be using the Highcharts chart library for the remainder of this post, simply because that's what I use regularly. Google Charts requires Internet access and the license terms disallow including the library locally in your application; in contrast, Highcharts can be self-contained in the application but the library does need to be purchased. Both libraries render excellent results.

If you want to play around with the idea of Sankey diagrams without doing any coding, check out the SankeyMATIC page, which will let you set up a Sankey diagram without doing any coding. It's a great way to prototype a Sankey diagram before deciding to invest development effort.

In looking at a Sankey diagram, you might get the idea that they must be complex to program but this is not the case at all. Most chart libraries that support Sankey diagrams simply take arrays as data input, wher each array element specifies the name of a source state, the name of a destination state, and a count. The chart library takes it from there, stitching things together for you. Both Google Charts and Highcharts operate this way. We'll see an example of that shortly.

Sample Scenario: An Order Workflow

To show an example, we'll imagine the order workflow for a company that accepts both online orders and phone orders.

  • Orders have to be prepaid unless credit is approved. 
  • Once an order is prepaid or credit-approved, associates assemble the order by pulling SKUs from a warehouse. The order may also be gift-wrapped. 
  • Once assembled, orders are placed on a shipping dock awaiting pick up by the shipping carrier.
  • Orders that were not prepaid are billed and followed-up by accounts receivable until the order is paid in full.

Our workflow is implemented as a series of stage-substage-state combinations. Specific actions connect one state to another. For example, submitting an online order that requests credit transitions state from Shopping | Online Order | Order Submitted to Order Configuration | Credit Approval | Credit Review; whereas a prepaid order would transition to Order Configuration | Payment Verification | Applying Payment.

Order Workflow Stages, Substages, and States

Now imagine this system is up and running and users want to see order activity. We could of course do the usual simple charts to show what is happening at each stage of the workflow. That might look something like this:

Single-Stage Views

This is certainly useful information; it lets us look into the breakdown of shopping activity. But it tells us nothing about what's happening across the workflow. So, we're now going to create a Sankey Diagram in Highcharts to provide that added view.

Creating Sankey Diagrams for the Sample Scenario

We can first start out simply, by showing transition from the first stage (Shopping) to the second stage (Order Confirmation). To do that, we'll take a standard JSFiddle for a Highcharts Sankey diagram and simply modify the data array as follows. We're merely supplying array entries, with the source-state, destination-state, and count. Our array includes all of the Shopping stage start states and all of the Order Confirmation stage end states.We know the source and end states from our workflow, and we know the counts by querying our order database.

Below is the resulting Sankey diagram. Even though we're only focusing on the first two stages of the overall workflow, we can see the Sankey diagram yields a very rich view of what is going on. We can hover over the connecting lines for the exact count, but just at-a-glance we can see a great deal. We see that phone orders are miniscule compared to online orders. We see that that most of the orders are in various states of credit approval processing. We are now getting a sense of how activity is flowing, which tells more of a story than just looking at one stage at a time.

Sankey diagram: Shopping Stage to Order Confirmation Stage

Now, let's add the remaining stages. Our data list now looks like this, with more array elements.

And below, the Sankey chart showing activity flow across the entire workflow. Now we really are geting a sense for what's happening across the board (JSFiddle).

I'd like to point out a few useful things about the Highcharts implementation. First off, you can hover over any conection line to get a tooltip showing the end-state name and the count. With some coding, you could also arrange it so that clicking on a section drill down into a detail chart.

Highcharts Sankey Diagram - Detail on Hover

Another useful feature is the menu at top right, which permits the chart to be export as a graphic.

Our sample scenario is a modest workflow: what if your workflow is much more complex, to the point where the diagram is really crammed? Well, you certainly need to keep the informaton understandable and you should strive to avoid overwhelming the user. Here are some strategies to consider:

  • Set an appropriate number of colors--too many may reduce understandability. Consider whether it makes sense to color-code stages.
  • When showing the full workflow, leave out the most detailed state level and provide that elsewhere.
  • Show multi-stage sequences in sections rather than the entire workflow in a single view.
  • Allow users to click on a stage to get an expanded view of the detail in that stage.
  • Group states of little interest together into a single node or leave them out altogether.

If I were doing this for a client rather than a blog post, I would put extra time on finishing touches: adjusting colors, adjusting font and text effects, and considering whether some of the information is not of interest to its audience. Even without doing so, I hope this introduction to Sankey diagrams provides some insight into how workflow activity can be shown to users in an insightful way.

I only discovered Sankey diagrams recently, but their usefulness was immediately apparent. Not only are they useful, they're also very simple to create using leading chart libraries.If you're facing the challenge of visualizing workflow activity, I encourage you to try them out.

No comments: