The emergence of mainstream Artificial Intelligence (AI) over the last few years has fundamentally changed the way many of us live our lives, both personally and professionally. AI-powered applications have already become widespread in industries like healthcare, banking, and retail, all of which are helping to streamline business operations and make customer experiences more seamless. In fact, it’s quickly becoming apparent that the only limitation AI has is that of the human imagination.
As the popularity of AI continues to grow, so too will the importance of AI workloads. This, however, raises key questions for organisations about how best to ensure these workloads are kept running without unexpected downtime, and even more importantly, how the underlying data can be properly secured without affecting overall business agility.
The limitations of traditional backup solutions
In order to protect against unplanned outages and data loss, many businesses continue to rely on traditional backup solutions. This is fine for many aspects of day-to-day operations, but when it comes to disaster recovery and business continuity, such solutions are no longer adequate, particularly where critical business data and workloads are involved. This is because traditional backups only protect individual servers, but not complete applications. After restoring data from a backup, the applications must first be manually reassembled from their individual components. This process can be painstaking, which is why restorations done from backups can often take days, weeks, or sometimes even months.
When it comes to critical AI workloads and applications, companies need solutions that can guarantee much faster recoverability. For this reason, more and more are turning to Disaster Recovery (DR) solutions, which offer far better recovery speeds compared to traditional backup solutions. At present, CDP (Continuous Data Protection) is the most effective recovery solution available. CDP works by keeping a continuous record of every change to a company’s data as soon as it is made. As a result, if an attack happens and recovery is required, data can be returned to the same status as it was just seconds before the attack took place, meaning very little data loss, if any, is incurred.
Protecting critical AI applications with near-synchronous replication
Effectively protecting critical AI applications requires the lowest possible Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs). One of the best ways of achieving this is through the use of near-synchronous replication, which offers the performance of synchronous replication without the high network or infrastructure requirements typically associated with it.
Near-synchronous replication is technically asynchronous replication, but it is similar to synchronous replication in that data is written to multiple locations simultaneously, allowing for a small delay between the primary and secondary locations. Because it is always on, it does not need to be scheduled, doesn’t use snapshots, writes to the source storage, and doesn’t have to wait for acknowledgement from the target storage.
One of the key advantages of near-synchronous replication is that it provides a high level of data availability and protection, while still allowing for faster write speeds than synchronous replication. This makes it a good choice for workloads like critical AI-applications with high write loads and/or large amounts of data.
Overcoming data mobility challenges
AI works by learning from huge amounts of data – the more the better. For this reason, the scale of data required for AI applications to work effectively is unlike anything most IT teams have ever had to deal with before. Even simple applications will use exabytes of raw data that needs to be carefully prepared for model training and subsequent inference. The data sets are often created on the edge and need to be transferred for processing into a central data repository. Furthermore, once an AI data lifecycle has come to an end, the data used needs to be archived for potential re-training in future.
All of this creates completely new challenges for IT infrastructure and management as these huge amounts of data need to be able to be moved continuously. Lifting and shifting these huge data sets will not be possible with the current network technology and data management solutions based on synchronous replication. To still be able to move AI data with limited processing power and bandwidth, asynchronous replication will have to be used. This ensures continuous replication with low bandwidth on a block level that does not produce high peaks of data transfers.
We’ve only just started to scratch the surface of what AI is capable of, but the potential is already there for everyone to see. While stories about it being used to create hit songs and replicate the painting styles of the world’s greatest artists make good headlines, the true future of AI lies in helping humanity solve much bigger challenges, from eradicating diseases to detecting natural disasters. However, powerful applications require powerful protection, which is why a growing number of organisations are already turning to solutions like CDP to keep their AI workloads safe. Furthermore, the sheer scale of data involved is creating entirely new challenges that must be properly addressed using suitable data-mobility solutions before AI’s full potential can be realized.