AI has fast become embedded in most organizations in one form or another. Whether through enterprise accounts for popular Large Language Models (LLMs) or custom-built pilots, it’s fast becoming a trusted ‘employee’. But, while it might seem to have all the answers, like any other new employee, AI doesn’t know everything. Organizations might be receiving outputs that look great at first glance, but too often, that output is generated from ‘messy data’, a clean exterior hiding a rotting interior.
It might seem like magic, but AI cannot create something from nothing. And your output depends solely on AI’s access to valid, uncompromised, and relevant data. If that is lost amid a forest of irrelevant data, then AI is grasping at anything it can find that’s even remotely relevant to its queries, creating not just inaccurate outputs, but also presenting a real security and regulation risk.
But if organizations proactively deal with this to plot the best path through this data, one that aligns with wider risk management needs, that gives AI only the data it needs - it could transform outputs for the better.
AI is what it ‘eats’
For most organizations, AI seems like magic. You ask an LLM a question and voila - a seemingly intelligent and well-researched answer appears. But AI doesn’t create something from nothing. And that brings up the big issue - the data itself. It’s not magic. To generate accurate and useful answers, AI needs access to data that is valid, uncompromised, and most importantly, relevant.
This is precisely why 95% of genAI business pilots still fail, because organizations are pulling from a well of data poisoned by Redundant, Obsolete, or Trivial (ROT) data. Data growth has exploded, in part thanks to AI, and this has quickly spiralled out of control. Today, most organizations lack a full picture of their data, allowing this ROT data to build up in the background. And now, as organizations start to leverage their data estates through AI, that same ROT data is holding their AI development back.
While off-the-shelf AI, such as LLMs, might be user-friendly and relatively simple to implement with built-in guardrails, custom internal AI requires a more hands-on approach. It often struggles to navigate the complex business rules and constant tuning required to access clean data, instead pulling up ROT data, and ending pilot schemes before they begin.
Why? Because ROT data does just that - it rots AI outputs. Without firm and precise guardrails around the data AI can pull from, custom AIs inevitably end up pulling in ROT, creating slow and incorrect outputs. Most pilots are likely failing not because the data they need isn’t there, but because organizations don’t know where to direct their AI. And, unfortunately for them, rot spreads. It doesn’t just poison your AI pilots - left unaddressed, it’ll seep into wider risk management concerns.
Can’t see the data for the (ROT) trees
As any self-respecting tree surgeon will tell you, rot doesn’t go away on its own - and too often, it spreads without you even realising it. The same goes for ROT data. And up until now, it’s been allowed to spread.
The disconnect across global AI regulation might have left organizations feeling like they’ve got one less thing to juggle, but this short-term relief has long-term consequences for their understanding and visibility of their data. Without regulation and compliance demands to push governance further up the priority list, it’s been overlooked - 92% of organizations still lack visibility of their AI identities. This hasn’t just held back AI pilots; it’s setting organizations back when it comes to compliance and governance, too. Because if you don’t know what data your AI is pulling from, when regulation does inevitably mature, you’ll be left scrambling to catch up.
This lack of visibility could also have a profound impact on cybersecurity. Say that instead of putting in the groundwork to build data visibility and clear ROT data, you’ve just opened up access to all of your data for AI as a kind of access-all-areas pass. Not only does that create slow and (probably) ineffective AI, but it also creates a form of centralized privilege, which in the wrong hands could be an effective attack vector. Just as businesses are getting to grips with it, so are attackers. And as soon as they’ve perfected a method for attacking AI tools, they can use them as a landing point from which to attack your whole infrastructure, much as they would with overly privileged identities today.
Cut away ROT today to promote growth tomorrow
So, rather than waiting for these cybersecurity and compliance concerns to emerge fully, cut them off at the root. Tame that overgrown forest of ROT data before it becomes an issue, not after.
Shine a light on the current state of your data, exposing and interrogating the data that needs to be cut away, not just to improve AI outputs, but to safeguard your business against future concerns. With a better understanding of your data, you can implement the right guardrails for your custom AI endeavours, ensuring that the data it’s pulling from isn’t just relevant, but secure too. And, hopefully, turn those AI business pilots from failures into successes along the way.
As regulation and governance inevitably catch up with AI, there will be one magic word - explainability. And unless you know the ins and outs of both your data and your AI, you’ll be lost for words to explain just how it’s really working. It’ll be no small task - we created, captured, copied, and consumed 181 zettabytes of data globally last year alone - but as the saying goes, if you can’t see the wood for the ROT, it might be time to get your axe out.