Sorting the enterprise digital mess: why data storage is just the beginning

By Mark Molyneux, EMEA CTO at Cohesity.

The world generates approximately 400 million terabytes of new data every day through activities like Instagram likes and comments, Slack messages between colleagues, and Zoom recordings of meetings. Every byte of data needs a home, and storing unnecessary data can be incredibly costly for businesses.

What’s worse, most businesses don’t even know what they’re storing. Employees inadvertently download all kinds of personal files—from bills and passport scans to pictures of their children—many of which pose a GDPR risk. While their desktops might be cluttered, the problems actually run much deeper.

Without proper visibility into their data or its storage locations, businesses struggle to manage storage efficiently, comply with regulations, and fully leverage the power of AI—because if you put garbage in, you get garbage out.

In this piece, I’ll explore why businesses must move beyond poor data management practices and take data indexing and classification seriously.

The hidden cost of unclassified data

Most businesses overlook their data practices, often underestimating the risks. According to the 2024 Verizon Data Breach Investigation Report, the average cost of a data breach is approximately $4.45 million, which can be exacerbated by poor data management practices. Think about it—critical files scattered across desktops & servers, buried in email threads, or saved under vague names like 'Final_v3_UPDATED.' Not only does this make data harder to access when needed, but it also increases security risk. More businesses than you think are powered by duct tape and willpower, and the consequences can be dire.

It’s not just about security, wasted storage costs, inefficiencies, duplicated efforts, redundant systems, lost time, and fed-up staff—though it is all of these things and more. It’s about missed opportunities. When companies aren’t able to extract vital insights from their data, they miss out on market trends, misunderstand customer needs, and end up with incorrect conclusions and poor decision-making.

The other big danger right now is that businesses are facing far stricter data compliance requirements than ever: DPA, GDPR, DORA, LGPD, BDSG, HIPAA—and there are more in the pipeline. Unclassified data makes it near impossible to really ensure compliance, leaving businesses open to hefty fines and reputational damage.

One of the most newsworthy examples of this in recent years is TalkTalk’s 2025 data breach. The breach originated from unauthorised access to a third-party supplier’s system, and the fallout exposed the personal information of more than 18.8 million current and former TalkTalk subscribers. It highlights that dangers can come from an organisation’s wider ecosystem, and they have a responsibility to ensure security and risk management is sold downstream.

The solution? Businesses should get serious about their data management practices. By organising and structuring data, organisations can reduce costs, improve efficiency, and stay compliant. The hidden cost of unclassified data is too high to ignore—it’s time to clean up the chaos. They do not and should not need to wait to be told by a regulator to do this.

Why classification matters

When data management practices are more chaotic than a toddler with a box of crayons, data classification might sound like an insurmountable challenge, but at its core, it’s just indexing data based on type, structure, relevance, and sensitivity, and then connecting the data to a relevant record policy – which defines for companies what they need to keep, why, and for how long

Most companies will do this to a degree. Many will have some form of data that is indexed and classified, like customer records and transaction logs, sitting neatly in databases, making it easy to search and analyse, as well as unclassified data, meaning everything else—the scattered emails, PDFs, and videos. The problem with poor data management practices comes from where we sit in time: the early stages of the AI revolution. When AI is meant to give us faster and more accurate insights, we need strong foundations to draw from. Otherwise, AI is flying blind.

Take ChatGPT, for example. It generates responses based on broad training data, which is great for drafting emails, but not so great for precise, data-driven insights. And this then leads to irrelevant or misleading information. 

That’s where modern data storage and indexing solutions come in. Many third-party providers don’t just store data; they make it smart by using advanced methodologies and proprietary natural language processing applications. The real game-changer? RAG (Retrieval-Augmented Generation) AI. Unlike generic internet-trained models that draw from anything and everything, RAG retrieves and verifies information directly from properly indexed data, ensuring accuracy and reliability, and importantly providing the source of its context.

The impact? Businesses get actionable intelligence from better-organised data, smoother workflows, cost savings, regulatory compliance, and happier colleagues. According to KPMG’s Global Tech Report 2024, the proportion of execs reporting a positive impact on profitability from data and analytics has risen by 25 percentage points on average. But most importantly, it creates a rock-solid foundation for AI-driven insights—insights that don’t just work today but get smarter and more valuable over time.

Steps to smarter data management

Businesses can streamline data management by moving from cluttered local systems to the cloud, where automated services handle organisation and indexing. These solutions automatically process and index data, extracting metadata and structuring information efficiently. This ensures seamless access, improved searchability, and optimised storage without manual intervention. This is part of the journey as organisations still need to apply classification here to avoid falling into the ‘keep everything’ trap.

Or organisations can also establish clear data management goals that are aligned with business objectives, implement strong security practices, and then investigate ways to automate processes to take the administrative burden off staff. It’s a complex process, but plenty of businesses do it. Once that’s up and running, it’s important to train staff and set expectations. And then employ AI test applications and run pilot tests.

Whether businesses do it themselves or look to a third-party provider, the key is to set a clear governance structure and get data in order before regulatory requirements become crushing. At a time when competition is rife, it’s time to bring the hammer down on frothy data storage techniques and begin treating data as what it truly is: a strategic asset.

By Giorgio Ippoliti - Technologist, Field Applications Engineering EMEA, Sandisk.
By Andy Ward, SVP of Absolute Security.
By Eric Herzog, Chief Marketing Officer (CMO) of Infinidat.
By Stewart Hunwick, Field CTO for the Storage Platforms and Solutions Team, Dell Technologies.
By Graham Jarvis, Freelance Business and Technology Journalist; Lead Business and Technology...