Blog Post

June 13, 2023

How to prototype Machine Learning (ML) in the public sector - in 10 days: A Roadmap by PUBLIC

In this blog, we lay out a simple roadmap for how public sector organisations of all shapes and sizes can rapidly prototype Machine Learning (ML) solutions in context. ML doesn't need to be daunting and actually represents a valuable tool for public sector teams to use to get more out of their data.

When discussing anything related to public sector digital transformation within the last decade, ‘data’ has consistently been the central theme which you can’t avoid even if you tried - and for good reason. We’ve seen public sector organisations of all types grapple with the challenges of legacy data transformation journeys and ultimately begin leveraging the real benefits which can be realised through smart use of good quality, accessible data.

Machine learning (ML) - a subcategory of the ever popular AI - is a great addition to any organisation’s analytics activities and is increasingly valuable to public sector organisations which have now equipped themselves with modernised architecture and large amounts of high quality data. However, ML is still uncommon in these contexts, often left to the domain of tech companies creating products to sell.

But what about bringing the art of ML into business-as-usual operations within a public sector context? Of course there are some real challenges: data access, data quality, skillsets, and senior-level sponsorship, just to name a few. Yet addressing these challenges head-on needn’t be a laborious affair, and the potential for impact is huge. Armed with the right way of thinking and a few guiding principles, ML models can be prototyped much quicker than you might think. It’s much less about gaining some unreachable technical expertise or overhauling your existing systems - it mostly comes down to having the right attitude and a little bit of innovative thinking.

A rapid method of ML prototyping creates an organic driver for innovation- pushing teams to think in creative ways and develop out-of-the-box solutions through quick feedback and iterative testing. Though it verges on cliched tech-speak to say, innovation really is a mindset. And in this context, this mindset and approach empowers organisations who’ve never used ML to test and learn quickly and only lose ten days in the process. For organisations who use ML regularly, it means they can test multiple use cases at the same time as testing one.

Drawing on our experience working across a range of public - and private - sector contexts on data analytics projects large and small, we’ve outlined below our roadmap for how to go from ‘ML sounds daunting and time-consuming’ to ‘We tested an ML prototype today and it was actually really valuable’ in less than 3 weeks.

Rapid ML Prototyping: A Roadmap by PUBLIC

Find the use case: Think hard about the value-drivers for your organisation

For the non-technical expert, technology like ML can represent a black box of capabilities, making it difficult to identify use cases at first glance. Yet once you wave away the smoke cloud of gatekeeping jargon and over-complicated tech-speak, practical use cases can be found by even the least tech savvy among us.

It’s always best to start with a list of things you wish you knew. That’s what machine learning - or any analytics for that matter - offers you: a piece of insight which you can use to make a more informed decision. If you’re thinking more strategically, you’ll want to get an understanding of where the most improvement is required within your organisation. That’s a good place to start.

So you’ve made a long list of use cases and through a series of iterations drill down to a top three? That’s where a lot of people stop. But go a step further and add some details around what value this ML model will bring to your organisation and you’ll set your team up for a clearer, more strategic path to prototyping in context. Remember this is a low cost exercise, so you don’t need a full business case. It is useful, however, to have a directional understanding of where the value sits. Is it to reduce cost-to-serve, increase revenue, improve customer experience, or increase staff morale? Use cases should always be tied directly and explicitly to value-drivers.

If you wish, you can put some numbers behind it to quantify value. Keep this light-touch, it’s better left to a pilot phase later on.

Set the stage for a rapid prototype: Set yourself up for a fast, high-value build from Day 1

Business goals can change rapidly. So once you have defined your use case, it’s best to head straight into a build. It’s also likely that you’ve built up some excitement internally whilst developing your use and value case, so start as you mean to go on.

Though it’s important to keep in mind that there’s more to rapid prototyping than just being in a hurry. Adopting this delivery style saves money whilst promoting innovation. If you’re a data scientist or manager desperate to do more ML, it also has the advantage of showing senior stakeholders that it can be achieved on a budget and on time.

In order to keep the momentum going, keep three tips in mind which will help streamline the process:

  • Line up data access from the start. As part of your use case development, line up the data access. A data scientist can prototype a model quickly, but sometimes data access can take time. You may be in an organisation who is mature when it comes to data and has sandboxes set up for scientists to experiment. But if not, you could just gather some sample data in the form of your choice (xls, csv, txt etc).
  • Think outside the box when it comes to data sources. The most innovative and robust models are often built from combining multiple data sources, so have a think about what data is linked to your outcome. Remember that there’s a lot of open data out there these days, as well as data for purchase accessed through easy APIs. This adds an extra level of complexity of course, so think hard about whether a scaled model that is reliant on this data is worth the effort.
  • Build a good team. Your team should not be large - this is a small, quick project and too many cooks spoil the broth. A two-person team is often sufficient. Skillset and skill-level are important; a skilled data scientist is a must. This is about developing at pace, so someone who has the experience to pick up new data, assess quality, and define features quickly is a must. Pair the scientist with someone who can act as an SME, manage wider stakeholders and remove blockers.
  • Data science tools are open source and free - use them! Maybe there will be a great cloud-hosted data stack available but with a decent laptop build, an ML model can be run locally. Remember there needn’t be a huge dataset for this. It’s a prototype, so as long as you have enough to say the model is robust and output trustworthy then Python and Anaconda from your desktop will be fine.

Crack on with the build: Ten days, two sprints, one prototype

At PUBLIC, we build in two five-day sprints. Sprint #1 is focused on exploration and experimentation, with Sprint #2 focusing on fine-tuning, visualisation, and telling your story.

Sprint #1

  • Days 1-2: Throughout the first two days the focus should be on getting to know the data. Often the least exciting part of the job for a data scientist- but absolutely critical for success. Best case scenario is the data scientist knows the data. Second best case is they have a data dictionary to work from. But sometimes it’s a painstaking process of sifting through and communicating with SMEs. Either way, a dedicated two days to build a list of potential model features is very important. These days should also be used to clean the data, impute missing data, and provide some summary stats that might be useful for the duration of the sprints.
  • Days 3-5: Now it’s time to move on to planning your models. There’s time to test multiple algorithms and - based on your data type, features, and outcome - your data scientist will select some good options. This is also the time to consider how the model will be interpreted via visualisations and the type of metrics used to validate. Start with feature engineering, which will have been heavily influenced by SME input from Days 1 & 2. Build some first draft models for directional purposes. Take a look at variable importance - not everything needs to be perfect at this point. But these days will give you a feel for what you need to do in terms of dimensionality reduction in Sprint #2.

Sprint #2

  • Days 6-7: Move onto dimensionality reduction and model re-runs until you have enough confidence in the quality to compare the model metrics. Create visualisations for your model metrics as these might be important for your prototype playback. By the end of Day 7 there will be a confirmed accurate model that is best suited for the data.
  • Days 8-9: These days are for summarising the findings and creating visualisations. The data scientist should work together with the SME to create some communication materials for the prototype - a good place to start is a plotted history of the journey, from use case to model. Remember, if we have found a successful model that should be productionised then we need to get some stakeholders on side. So effective communication of the output is essential.
  • Day 10: Prototype playback day. The final day of the rapid prototype is dedicated to communicating results, aligning it to the original value case, and setting pragmatic next steps. Communication of your findings and a plain-English summary is critical for effective communication to senior stakeholders who will make a decision as to whether this model is worth further development.

Over the next few weeks, we’ll be sharing more of our ideas, perspectives and approaches that inform the work we are doing across our fast-growing Data & AI (DAI) practice. We’re keen to spark new, meaningful discussions and co-develop novel ideas around these topics, so please do engage with our team along the way.

If you have reactions, ideas or just want to chat with a fellow DAI enthusiast, feel free to drop Thomas (thomas.chalk@public.io) or Mahlet (mahlet@public.io) a message!

Partners

No items found.
Photo by the author

Thomas Chalk

Director of Digital, Data & Technology

Explore more insights

Stay in the loop!

Sign up to our monthly newsletter to get a snapshot of PUBLIC’s impact across the public sector, thought-leadership from our experts, and opportunities to get involved!