Digging into data with python and Dash — my first app

image by Meghan Holmes on Unsplash

How best to dig into the capabilities of different AI focused software? This is step 2 in my journey to visualising data using IBM Cloud, python and Dash from plot.ly.

Why are we here?

As described in my last blog post, I’m using python and Dash on IBM Cloud to create a tool that allows users to visualise a small dataset. I’m looking to allow users to dig into the capabilities of the various flavours of AI focused software available on the POWER platform. For example:

  • “does application A allow access to multiple data types?”
  • “does platform B scale horizontally?”
  • “does this software suit a business user or a data scientist?”

You get the idea. But really what I’m going to describe here could be used for a variety of things, so equally you can consider this a tutorial for creating a python flask using Dash in the IBM Cloud. The code described here is available on github in the eco-frame-public repository.

It’s alive!

Sharing is caring

I’m keen to share what I’ve learnt while working on this application for a few reasons. Firstly, I found a variety of material available for the various tools I’ve used here, but not pulling them all together. Plus I’ve been asked a a couple of times about the specifics behind my fledgling tool, so now I can point people here. And it’s always useful for me to come back to. (And in case there is any doubt, the title of this section is one of my least favourite expressions. But seemed somehow apt.)

The pieces of the puzzle

As mentioned, I’ve used python, Dash from plot.ly and IBM Cloud here — so let’s look at those first briefly, before we get into the coding good stuff. My programming background is actually Fortran based (a confession which has often elicited a wry smile from my techie peers). Fortran was (and still can be) a common language in the HPC / academic world. A big chunk of my PhD was a Fortran program modelling emissions across the full spectrum, radio waves to x-rays, of Ultra Luminous Infra Red Galaxies (ULIRGs — got to love that acronym). So python felt like a sensible step on from my Fortran beginnings and the variety of bash scripting / C / R / random other languages I’ve dabbled in since. Plus it’s super trendy now. :)

I’d come across plotly before through some of the software companies I work with. I *really* liked (read for liked: “could have spent countless week playing with”) their visualisation tooling, but was particularly drawn to their Dash framework since I didn’t want to have to dabble in javascript to create the web interface. Plus they had some nice tutorials and a load of examples to draw from.

Finally I based it in the IBM Cloud using a Cloud Foundry python runtime. Given my employer, this seemed to be a good choice for location. :) But that aside, it all worked very smoothly, again with a load of examples available for me to draw from.

So what about the data?

At this point, it seems pertinent to mention the data I’m slicing and dicing. This consists of (evidence based) capability ratings, from 0–5, for a number of AI software applications. For instance these capabilities might be an evaluation of data types accessible, the types of models included, application scalability, to name but a few. I’m not going to make the specifics of that available here, but instead I’ve used randomly generated data to demonstrate the kind of output I’m getting. But if you’re interested, have look at some of the work Gartner have done related to Critical Capabilities for Data Science and Machine Learning Platforms and then you’ll get a flavour of the kind of capabilities I started with.

I had originally planned to use Cloudant as the database back end for my application. In the end I decided to start simple and use a csv file to hold my data in its current state. This was very easy to work with using the pandas python package, so a good starting point. I’ve got plans to add more information for the applications I’m looking at, likely of a range of data types, at which point I’ll revisit the database backend again.

My first application

So let’s get into the code! This first application selector_heatmap.py is a simple example of an interactive heatmap of the data. It uses a pull down menu to add or remove applications from the overall heatmap. *Disclaimer* — there are likely more efficient ways of working with the data than I’m using here, I’m still learning!

The base structure of a Dash app is a layout section which defines how the page is structured and looks, followed by a callback function which defines the interactivity of the app.

In the layout section you use component classes and keywords to set up the base HTML for the page. So here we have a dropdown which allows the use to select one or more application names. It finishes with a Graph component which then renders the visualisation based on the user input from the dropdown. The names are pulled from a dataframe df which is created from the CSV.

app.layout = html.Div([html.H1('Ecosystem Framework'),
html.H2('Application heatmap - choose from the drop down list to add'),
dcc.Dropdown(
placeholder=['Select application name(s)'],
options=[{'label': i, 'value': i} for i in df.index],
multi=True,
id='isv_select'
),
dcc.Graph(id='heatmap_output')
])

We then have the callback function which filters the data based on the input from the dropdown, constructs a figure and then passes this back to the Dash app. So in this example we get input from the dropdown with id isv_select and then create output with id heatmap_output to pass back the the app. The specifics of the heatmap come from a modified dataframe dff which is created based on the selections from the dropdown.

@app.callback(
Output('heatmap_output', 'figure'),
[Input('isv_select', 'value')])
def update_figure(value):
if value is None:
return {'data': []}
else:
dff = df.loc[value,:]
scaled_size = left_margin + right_margin + 150*len(value)
return {
'data': [{
'z': dff.values.T.tolist(),
'y': dff.columns.tolist(),
'x': dff.index.tolist(),
'ygap': 2,
'reversescale': 'true',
'colorscale': [[0, 'white'], [1, 'blue']],
'type': 'heatmap',
}],
'layout': {
'height': 750,
'width': scaled_size,
'xaxis': {'side':'top'},
'margin': {
'l': left_margin,
'r': right_margin,
'b': 150,
't': 100
}
}
}

If you look at the many examples shared by chriddyp (cofounder of Plotly and author of Dash) and others, you’ll see different people format their Dash code in different ways (which isn’t really a surprise, welcome to coding). The best idea is to pick a format that works for you and stick with it. I’ve gone with what allows me to easily parse what I’ve created and (perhaps more importantly) reuse chunks of code in other apps with little modification.

As you might imagine, there are many options to allow you to tweak your stunning final visualisation. For example I needed to change my margins to allow for long capability titles (must be more succinct!) and to allow for dynamic resizing when more applications were selected by the user. My main caution here would be to be wary of the amount of time (and fun) you can spend playing around with formatting!

Finally, the code above needs to be topped and tailed with python package imports and the overall structure to allow it to run as a local / cloud foundry app. And there you have it.

This can be run locally on your own server / machine for debugging as you create code and tweak things, with the output on http://127.0.0.1:8050/:

$ python selector_heatmap.py
* Running on http://127.0.0.1:8050/ (Press CTRL+C to quit)

Getting it into the cloud

The final step is to turn this local python app into a web based app. As mentioned, I used IBM Cloud as the base for this (previously known as Bluemix to avoid confusion). There’s a good tutorial for getting started with python flasks in IBM Cloud which has the key steps to doing this.

I deployed using the following set up:

A manifest.yml which includes basic information about the app, such as the name, how much memory / disk to allocate for each instance.

$ cat manifest.yml
---
applications:
- path: .
memory: 256M
name: app-name-here
disk_quota: 1024M
health_check_type: http

And then Procfile, requirements.txt and setup.py for the specifics of the python app deployment.

$ cat Procfile 
web: gunicorn selector_heatmap:server
$ cat requirements.txt
gunicorn
numpy
plotly
dash
dash-renderer
dash-html-components
dash-core-components
pandas==0.20.3
$ cat setup.py# Always prefer setuptools over distutils
from setuptools import setup, find_packages
# To use a consistent encoding
from codecs import open
from os import path
here = path.abspath(path.dirname(__file__))# Get the long description from the README file
with open(path.join(here, 'README.md'), encoding='utf-8') as f:
long_description = f.read()
setup(
name='eco-frame-public',
version='1.0.0',
description='Ecosystem Framework',
long_description=long_description,
url='https://github.com/mandieq/eco-frame-public',
)

So what’s next?

This is just a simple starting point but a good way to understand the basics of working with Dash.

When I started working with Dash earlier this year I found some of the HTML formatting quite clunky and restrictive outside of using markdown. I’m pleased to say that there’s been some recent work around images and local css files which I’m looking forward to taking advantage of. There are also some tweaks I’d like to make to this example, for instance the initial view displayed before data is selected.

My next steps are to look at some different types of interactive visualisations and output, and then onto multi page apps. Watch out for future blogs to cover those!

Geek. AI and Data Science. Continual curious learner. High Performance Analytics. OpenPOWER. IBM Academy of Technology. Mum of two. Astro PhD. Views are my own.