The rapid deployment of AI across government agencies presents unprecedented opportunities and risks. Recent incidents worldwide offer crucial lessons for governance, risk and assurance professionals tasked with ensuring responsible implementation. This article captures insights from a leading practitioner in the ethical use of AI, Darren Menachemson, who spoke at a recent IIA forum.
The AI Revolution is Here – Ready or Not
The Australian Government recently launched Gov.ai, essentially a sovereign ChatGPT for public service use. This milestone underscores a critical reality: AI deployment in government is accelerating at breakneck speed.
The MIT AI Risk Repository tells a sobering story. From just a handful of reported AI incidents a few years ago, we now see over 1,100 documented cases of AI causing societal harm – and the curve is getting steeper each year. This isn't just about more AI being deployed; it's about increasingly powerful AI becoming available to everyone, from government agencies to bad actors.
When AI Goes Wrong: Lessons from the Frontlines
The UK's Discriminatory Welfare System: The UK's Department of Work and Pensions deployed a self-learning AI system to detect welfare fraud. The system identified patterns in historical data and escalated high-risk cases to human officers. Unfortunately, fraud risk often correlates with vulnerability, not actual fraud. The AI systematically targeted single mothers and people with disabilities – not because they were more likely to commit fraud, but because they represented higher-value targets in the historical data. Tens of thousands of vulnerable people had their benefits reduced or withheld, forced to prove their innocence in a traumatic process activists dubbed "hurt first, fix later."
Florida's Racist Parole Algorithm: The COMPAS system rated defendants' likelihood of reoffending on a scale of 1-10, influencing bail and parole decisions. When investigative journalists examined the outcomes, they found a disturbing pattern: African-American defendants consistently received higher risk scores despite actually having lower reoffending rates, while white defendants received lower scores despite higher actual reoffending rates. The algorithm made approximately 300 demonstrably racist decisions.
The Netherlands' Childcare Catastrophe: The Dutch government's AI system for detecting childcare benefit overpayments identified 26,000 families for benefit clawbacks, disproportionately targeting families with foreign backgrounds. The outcome was devastating - 1,000 children ended up in foster care, multiple families experienced the suicide of a loved one, and the Dutch cabinet resigned in disgrace.
Darren Menachemson suggests there are several interconnected risk categories that need to be considered.
A Success Story: Honeycomb's Human-AI Partnership
Not all AI deployments end in disaster. Causeway, a UK organization supporting victims of modern slavery, developed an AI system called Honeycomb that demonstrates responsible implementation. Trained on over 39,000 case files spanning five years, Honeycomb analyses victim journals and progress reports to suggest optimal support service referrals. The key to its success: maintaining meaningful human oversight. When the AI identified potential issues requiring cultural sensitivity or complex judgment, it automatically routed cases to human case workers. The result? Effective scaling of support services while maintaining the compassionate human element that vulnerable populations need.
The Dilemma: Innovation vs. Risk
Here's the challenge keeping leaders awake at night: the evidence increasingly suggests that failing to deploy AI in appropriate use cases may itself constitute negligence. In healthcare, for instance, mounting evidence shows AI can reduce diagnostic errors and save lives. The question isn't whether to use AI, but how to use it responsibly.
This creates what we might call the "innovation imperative" – the pressure to realize AI's benefits while controlling its risks. For governance, risk and assurance professionals, this means developing new frameworks for assessing and monitoring AI systems throughout their lifecycle.
The question isn't whether we're ready – it's whether we're willing to learn fast enough to keep pace with the technology we're tasked with overseeing.