Fraud Blocker

Lab Notes

Care planning

Building in plain sight: what safety really looks like when you're building AI for care

Discover how Birdie builds SmartPlans, our AI-powered care assessment tool, with clinical safety at the core. See our process from hazard logs to live testing.

Table of contents

In an AI-first world, it’s easy to think that the hard part is done - good ideas are all you need now to build something. The models are capable, the tools are accessible, and anyone with enough enthusiasm can ship something that works.

Some of that is true. But when the software you're building is for homecare - where a missed alert, a miscalculated medication record, or a garbled care note has real consequences for a real person in a vulnerable situation - it’s potentially dangerous.

At Birdie, we're building SmartPlans, an AI-powered care assessment tool. This post is about what building with AI safely looks like in real life. Not because we think we're uniquely virtuous, but because we think it's worth being honest about how much work responsible product development involves in care - and why anyone claiming to build in this space should be asked some hard questions.

Here’s how it all works.

The check happens before the idea is fully formed

When we decide to prioritise a new feature, we start with a phase we call "shaping" - this is where product managers, designers, and engineers start turning a problem into something we can build - and the first person we loop in beyond the core team is Jenny, our Clinical Safety Officer (CSO).

The CSO's job at this stage is to pressure-test the thinking before any design has been validated and before any engineering work is planned out. For example:

  • What happens if a carer reads this and misinterprets it?
  • What does the product show if it doesn’t have enough data?
  • What assumptions are we making about how people will use this during a real care visit?

These are hard questions to answer at such an early stage, which is exactly why we ask them. Catching a clinical risk at the idea stage costs a conversation. Catching it after you've built and shipped costs far more - sometimes in ways that can't be undone.

While it's being built

As we build the product we complete a SWIFT (stands for “Structured What-If Technique”) check, where we identify hazards which could emerge from people using the product.

This works by the team sitting down and completing a number of “what ifs” on the product, trying to imagine what could feasibly go wrong, quantifying the risk and determining if there are additional mitigations which need to be built into the product. These are reviewed by our CSO and inform the Hazard Log, which you’ll meet next.

Before we release

When a feature moves to launching, the team and the CSO create a Hazard Log.

The Hazard Log is a formal document that lists every clinical safety risk we've identified for the feature. For each risk, we record how likely it is, how severe the potential harm could be, and what we're doing to reduce it. Creating the Hazard Log while we’re building gives us the flexibility to iterate on the product to accommodate any known safety risks.

The standard we work to is DCB0129, an NHS information standard that exists specifically for health IT systems. It was designed by people who understood that software in care settings can cause harm in ways that software in other settings can't, and that good intentions are not a safety mechanism.

Then it’s time for more testing. Testing at Birdie isn't just "does it do what the ticket says." Our Quality Assurance (QA) process covers two different kinds of testing:

  • Testing the software: This checks that a feature does what it's supposed to do. If the design says "when a carer marks a medication as given, the record should update immediately", then this is the process of actually doing that and confirming it works as expected.
  • Testing the AI: This checks the layers of AI within the product to make sure that it’s doing what we need it to do and not doing things we don’t want it to do. This is where we check for things called hallucinations and clinical risks. We do all of this within a custom made evaluation app which helps us measure the quality of the AI, and highlights areas where we need to improve. This feedback loop continues throughout the release of our products.

Early on before launch, the team runs a pre-mortem: a deliberate exercise in imagining what could go wrong before it does. The support team is briefed, and the feature is tested internally across the company.

We then go to Alpha phase - testing in real conditions with real partners. Our CSO and Data Protection Officer (DPO) review the findings of these real condition tests.

After that, we move to Beta stage. This where we test the AI with an even bigger group of users, focussing on edge cases (things that don’t happen often, but could).

If anything surfaces a safety concern, the Hazard Log is updated and additional mitigations are put in place before we go further. Alongside that, we produce a Clinical Safety Case Report - this is evidence, in writing, that we've met the relevant safety requirements.

We only move to launch when the metrics from evaluations and testing - as well as client feedback - achieve their targets from a safety perspective.

It doesn't stop when something ships

This is the part that most product blogs skip over: clinical safety doesn't have a finish line. Once a feature is live, it sits inside a continuous surveillance process:

  • Clinical incidents (anything that might have clinical implications, whether or not it turns out to be serious) trigger our incident management process and reach the CSO, who determines whether the Hazard Log needs to change and whether additional controls are needed.
  • We track clinical safety risks in a dedicated initiative in our product management system so nothing gets buried in the noise of day-to-day development.

This is ongoing, structured, and owned - because it needs to be.

What this means for SmartPlans

We're writing about this now because we think it matters for how you evaluate any AI product being built for care.

When someone tells you they've built an AI care assessment tool, the interesting questions aren't about the model or the interface. They're: what standard does it comply with? Who’s the Clinical Safety Officer and when did they first see the feature? What does your Hazard Log say? What happens if something goes wrong after launch?

SmartPlans has been through every stage described above. It has a Hazard Log and a Clinical Safety Case Report. It was reviewed by our CSO before the first design was approved, before the first story was written, and again after Beta testing. It sits inside a surveillance process that will catch and route any clinical concern that emerges in the real world. Building to these exacting standards is also how we maintain our compliance as an NHS Assured Supplier for Digital Social Care Records (DSCR).

Embedding clinical safety into the heart of the product development is a non-negotiable for us at Birdie. We’ve come a long way in this, but we know we can always do better - and that’s why we work hand-in-hand with our CSO to keep improving and building on these clinical safety activities with every new development we make.

To find out more about the AI products we're building (and how we're building them), head on over to the Smarter Care Lab.

Published date:

April 27, 2026

Author:

Johanna Barlow

Share on socials

Join the mailing list

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Ready to work smarter, not just harder?

Transform your homecare agency with technology that connects, informs, and supports your team every step of the way.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo.

99.9% uptime

99.9% uptime

99.9% uptime

99.9% uptime