How data streaming is solving the AI sustainability puzzle

By Richard Jones, VP Sales Northern Europe, Confluent.

  • 4 months ago Posted in

When we think of artificial intelligence, we rarely think of the physical world. We think of apps and screens, chatbots and virtual assistants. What we don’t think about is the 11,000+ data centres around the planet, packed full of server racks, cooling systems, and enough cabling to wrap around the world multiple times over.

Today’s data centres consume around 200 million gallons of water per year, while the AI industry uses more energy than a small country. For businesses that have commitments around sustainability and carbon reduction, that poses a serious problem.

While there’s no denying the world will continue to need new data centres, there is also growing demand to maximise the capacity that we already have available. By optimising existing space and data processing techniques, businesses can start to do more with less — maximising the advantages of AI without abandoning their existing infrastructure, or significantly increasing their carbon footprint. 

This article will explore that idea, and how the recent shift towards data streaming is helping to solve the ‘AI-sustainability puzzle’, making AI more efficient, more sustainable, and more powerful for businesses.

The rise of AI

AI might not have taken all our jobs yet, but it’s certainly captured the minds of the nation. Research suggests that 56% of young people in the UK have used AI in their workplace, while 36% of the UK population have used Generative AI.

With this explosion in AI uptake, the network itself is starting to feel the strain. As business and consumer demand skyrockets, John Pettigrew, Chief Executive of the National Grid, has suggested that data centre power usage will “surge six-fold in ten years,” with £58bn in investment aiming to solve the issue.

Everyone needs to do what they can to reduce that pressure – which means that any business has an obligation to reconsider how they treat, store, and use their data, and how it can be improved. It might not be the be-all and end-all, but incremental improvements can lead to genuine progress, and allow data centres to focus more on their own efforts.

Bad batch

One change that’s set to make a major difference to both to the efficiency of data centres and to the power struggles they’re wrestling with is the abandonment of batch processing as the dominant means of wrangling data.

Batch processing is just as it sounds: the collection of data into huge storage repositories, which can then be cut and interrogated as needed. This processing often takes place during ‘downtime’ — for instance, data being collected throughout the day and then processed and analysed at night.

While it’s a well-established practice, batch processing has some fairly obvious limitations:

Latency can be minutes, hours, or even days, rendering many results redundant by the time they’re received — which is a waste of the power being used to process them, and necessitates further batches.

When interrogating large bodies of data, it’s inevitable that unnecessary data within a batch will get dragged into processing, wasting power and time.

Regardless of how successful the processing is, it necessitates power-intensive storage and processing hardware.

In other words: the de facto mode of data processing for most AI is inefficient, cumbersome, and power-hungry. The conventional approach not only fails to prioritise sustainability, but often fails to deliver on performance, too. 

Go with the flow

The alternative that flips many of these concerns on their head is data streaming. 

Rather than collate and store huge bodies of data only to dredge them back up when they’re of use, data streaming processes each individual data point in near-real time, as it enters the system, within the context of the relevant data that has come before. 

This means that actions are taken in milliseconds, based on data that is more accurate than batch processing. At Confluent, we refer to this as ‘data in motion’ – the organisation and activation of data even as it’s moving through the system.

It also eliminates the latency of batch processing, as well as more effectively cleaning the data as it enters the system – meaning that any repositories are comprised of more accurate and more timely data than a batch alternative, and thus more efficient to use.

On the hardware front, data streaming can be conducted entirely through the cloud, on-premises, or hybrid, offering serious flexibility from a hardware budget standpoint. And those data stores are either cheaper, or more cost-effective, than a batch equivalent.

All of this speaks to a more sustainable basis upon which data processing can not only run but see a marked improvement in performance. So, to circle back – what does this mean for the data centres in the age of AI?

Looking forwards

A shift towards more sustainable operations will allow data centres to run more efficiently, getting more from the power that’s available to them. With companies taking care of their end of the bargain, those data centres are more able to explore opportunities to refine their own ways of working, as less emergency surgery is required to keep things afloat as they are.

Some of that will be a simple refinement of day-to-day operations in the data centre. As AI becomes more ubiquitously available, we’re likely to see data streaming uptake right across the data centre space.

It also allows data centres to focus on more creative elements of sustainability that pass on benefits to the immediate community around them. Over 10,000 homes in London can benefit from the waste heat of data centres, which helps to provide heating and hot water; data centres in Sweden and France have used it to cultivate greenhouses.

Ultimately, it’s the incremental improvement of companies when it comes to sustainable operations that allow this experimentation, and alleviates the required pressure on the data centre to help facilitate it. A shift from batch processing to data streaming not only delivers this, but adds genuine performance improvements into the bargain.

By David de Santiago, Group AI & Digital Services Director at OCS.
By Krishna Sai, Senior VP of Technology and Engineering.
By Danny Lopez, CEO of Glasswall.
By Oz Olivo, VP, Product Management at Inrupt.
By Jason Beckett, Head of Technical Sales, Hitachi Vantara.
By Thomas Kiessling, CTO Siemens Smart Infrastructure & Gerhard Kress, SVP Xcelerator Portfolio...
By Dael Williamson, Chief Technology Officer EMEA at Databricks.