Make magic with Mesop: python based web apps

6 min readSep 2, 2024

I’ve recently been trying out Mesop, a newly released python based framework for web apps, and overall I’m impressed with the results. This blog is a summary of some tips and tricks I’ve put together while getting started with Mesop. I’ve also included a github repo with code for one of the simple retail focused demo apps I’ve created, which shows how multimodal embeddings can be used with retail product sets.

A taster of good things to come (pun intended)

What’s the point?

I demo a lot, and when I say ‘a lot’, I really mean A LOT. And I’m frequently creating new demos for specific use cases using python. I’ll freely admit that I’m not a front-end developer, so making something slick looking can be tricky and potentially quite time consuming.

I’ve had success with streamlit, gradio and other tools in the past. Tools like this allow me to quickly showcase various AI / GenAI capabilities with a shiny front end. I was keen to try Mesop because I liked the idea of the flexibility of the “look” of the output. Plus the streamlit “running” figures were making me feel lazy (and yes, I know with some effort most of the styling can be removed!).

Mesop is open source code from Google, which — to use their tag line — allows you to:

“Create web apps without the complexity of frontend development”

There is a fast growing Mesop site with docs, demos and showcase space, plus the github code repo. So plenty of useful info to get started!

I’d like to have a go, how do I get started?

Firstly, RTFM (Read The Fabulous 😁 Manual). No really.

There are some key things you need to get your head around before you dive in. These include core concepts like:

state — “allows you store information about what the user did in a structured way” in order to share information between the various parts of your fancy web app
components — “the building blocks of a Mesop application” which you can choose from to get user input, show outputs and do various clever things
event handlers — “responsible for updating state based on the incoming event” allowing you to trigger actions based on things like button clicks and input updates

What’s easy?

Once you’ve spent a bit of time understanding the core concepts then you really can get going fast. As with similar tools, you can update your code and see the impact on your web app in another window in real time. Which allows for some quick iteration.

Multi page apps are very possible. You can split specific capabilities between pages to avoid very “busy” views, but equally navigate between pages quickly and share state (so user inputs, data etc) between them as well for easy reuse.

There are some really nice high level components already available in Mesop Labs, which mean you can be up and running quickly for common GenAI centred use cases like chat, text and image generation. Just bolt on your favourite LLM in the backend. Plus the code for these components is available so you can “borrow” some of the formatting if you are more creatively challenged like me.

What’s slightly harder? Useful tips and tricks

Caveat: as mentioned, Mesop is a fast evolving framework, so some of the sharp edges I’ve come across might well get softened over time!

The first thing I noticed actually might be less Mesop specific and more related to good SWE practises. If you have a multi page app in one *.py file it can get big pretty quickly. Obviously the sensible longer term approach is to split your code into different files in a logical structure. But in the short term I’ve found something like the following set up is a useful way to keep similar elements together and enable easier navigation.

# ======== Environment set up ========
# buckets / projects / imports etc

# ======== State class ========
# the various variables you need through your app

# ======== Functions to handle state changes ========
# the result of text input, button clicks, file uploads
# pointing towards the helper functions to get things done

# ======== Helper functions ========
# calls to APIs / working with variables to get outputs

# ======== Page set up ========
# the structure of the page(s) making up your app
# makes life easier to get a view of pages and interactions
# styling elements all in one place

# ======== Longer text inputs ========
# if you need some longer inputs / prompts which would
# otherwise make your code harder to read

# ======== Formatting ========
# formats which you might reuse in various pages, again
# helping with making code more user friendly

I did have some “fun” with Mesop boxes using me.box() as part of getting my layouts right. And I’m still not convinced I’m doing this in the most efficient way, but I am getting some nice outputs! 🥳 Understanding how flexboxes work is REALLY important, since they can be very powerful if used correctly (and correspondingly quite frustrating if used incorrectly). And as usual, finding an app you like the look of and looking at the formatting used in the source code is a good way to get going.

Finally, one thing which initially tripped me up was dealing with local images. I mostly worked with images stored in Google Cloud Storage (GCS), but in some cases I had local images which I needed to display. And it isn’t possible to display local images using Mesop direct, only URLs. However a nice workaround to this is to convert a local image to a data URL which then CAN be used in the me.image() call.

For example the following, where img is an image in the GeneratedImage response from the Vertex AI SDK when editing an image using Imagen:

img64 = img._as_base64_string()
me.image(src=f"data:'image/png';base64,{img64}")

Show me some code!

Enough talk, let’s see it in action.

The github repo retail_embeddings shows a simple two page demo I created using Mesop, which illustrates how multimodal embeddings can be used to search retail product images and titles.

The first page loads pre-created product data from a user specified GCS location. This data includes product details, pointers to product images in GCS and multimodal embeddings for the title text and image for each product. In the backend I’m loading a *.csv file into a pandas DataFrame and then displaying the images and a subset of the DataFrame in the UI.

(side note, all the data used were created using Gemini 1.5 Models — Gemini 1.5 Pro to create a taxonomy, Gemini 1.5 Flash to create product details — and the images created using Imagen 3, all available in the Google Cloud Vertex AI platform!)

The second page, navigated to via a button on the first page, allows the user to enter a text query to find similar products using multimodal embeddings (based on images and text). In the backend I get the embedding for the text query in the same embedding space as the products available (using the Vertex AI multimodalembedding model) and then return the three products with the highest cosine score when compared to the query.

View of the second page which searches using a text query based on embeddings

Where next?

It certainly makes sense to run Mesop locally while developing, but likely you’ll want to deploy your shiny new app somewhere to make it accessible to users! I’ve been using Google Cloud Run since it is very straightforward to deploy from source code. There’s some guidance on doing this and other deployment options but I’d like to investigate this further wrt the best config for performance.

If this all sounds interesting, I’d recommend going to the Mesop Getting Started pages and having a go. 😁

Happy Mesop-ing!

(disclaimer, an expression I just made up)