The new enterprise data dilemma

Data analytics and ML in a multi-cloud enterprise. By Vinay Wagh, Director of Product at Databricks.

  • 3 years ago Posted in

Over the past decade we have seen the evolution of how enterprises manage their data from being on-premises to a gradual shift towards cloud, and it is raising some familiar concerns. These are things like security, governance, vendor lock-in, or missing out on best-of-breed solutions, to name a few.

Today, most organizations are not standardizing on a single cloud. They select the best platform for each workload to optimize their business outcomes. As a result, we are now on the precipice of the next evolution, multi-cloud, and there are two types of enterprises emerging: those that are already multi-cloud and those that will be.

It’s important to note that multi-cloud is not about abstracting cloud providers so users can seamlessly run the same workload anywhere. Instead, multi-cloud is about making a choice between cloud providers based on the use case and making the migration of workloads from one cloud to another feasible.

Enterprises are already faced with the growing proliferation of different departments on different clouds optimizing for best-of-breed services from cloud providers, and then there is an entirely different beast of integrating cloud strategies from mergers and acquisitions. Though one might immediately think this is a management nightmare in the making, which it can be if not done right, there are benefits as well. Specifically, benefits that increase the opportunity for business agility, flexibility, and scalability, while avoiding the dreaded vendor lock-in.

In such a complex multi-cloud landscape, having a long term data strategy that allows businesses to securely use quality data across multiple clouds without having to worry about data migration is a requirement. What’s equally important is providing a consistent and collaborative open unified data analytics platform that extends across clouds for data teams to use the data and generate business value.

The role of open source technology

Utilizing open source as a part of the data quality and integrity strategy that extends multiple clouds is important for enterprises to consider. An open source storage layer ensures the consistent treatment and experience for data while further enabling a working multi-cloud strategy. It is a critical part of maintaining quality and integrity and having that layer in open source enables portability.

Why is it critical? Because only open-source technologies and data formats can truly enable the benefits of multi-cloud. The ability to automate configurations, enforce security and governance policies, and replicate data in open formats across clouds give you a true choice between cloud providers.

The data quality challenge

Today, investing in machine learning (ML) is one of the most important data priorities for cloud-powered organizations. However, ML models are only as good as the data they learn from. Therefore, maintaining a high standard of data quality and integrity in a multi-cloud environment is extremely important.

The solution to maintaining data quality starts with having the right data governance policies in place so you can manage who is responsible for ensuring the quality of a specified dataset, which groups/teams are allowed to access it, and what applications are using the dataset to make which business decisions. This allows you to measure quality issues and build accountability into the process.

Technology plays a critical role here. With the increasing quantity of data produced each day, the manner in which you store your data sets the stage in what you’ll be able to do with it later. Every organization has a mixed approach across data lakes and data warehouses that is best for their use

cases. These of course come with benefits and challenges across data management, flexibility, and usability. When we specifically focus on data lakes, having a storage layer on top of the data lake that provides transactional guarantees and schema enforcement that it in turn ensures high integrity and quality of data.

What about security?

Often, organisations need to use some of their most proprietary and important data to build ML-powered applications. Therefore, security is extremely important. Correctly implementing and maintaining security policies on one cloud is hard enough, now applying that on two or more can significantly compound the difficulty.

However, ensuring data security in a multi-cloud infrastructure isn’t a fairy tale that requires some magical amulet to unite all security policies together. Technically speaking, a good multi-cloud strategy will not try to abstract core security functions to be cloud-agnostic. Instead, it will embrace the cloud-native constructs and advantages of each provider that are built for their respective cloud. The key to success around multi-cloud data security and governance is figuring out how to build a consistent framework on top of the cloud provider's constructs that easily defines policies and implements them across a wide range of users working on data analytics and ML.

This framework can also abstract out the cloud-specific implementation so that developers and data scientists don’t need to write cloud-specific code.

For example, in the case of data analytics and ML, having a unified platform could provide a policy framework that allows your admin to specify which users have access to PII data, create clusters to process data, share notebooks or run ETL jobs, or have restricted access to production workspaces.

The multi-cloud powered business is here to stay

Businesses realize the importance of data and are using it to make informed decisions via analytics and using machine learning to solve challenges, create new data products and revenue streams, improve operational efficiencies, and more. In many cases, thriving organizations are treating data as one of their most valuable assets.

The organizations that are redefining their space are the ones that are implementing data strategies that enable them to use their data at scale, which mandates high data quality standards with the right security policies. As we move towards an increasingly larger multi-cloud landscape, businesses need to quickly ensure that data quality is maintained across clouds with strong data governance and security. Without these, organizations are not able to use their data assets to their fullest potential. Private digital workspaces for accessing data are going to be the future where the compliance and governance teams will function best when tied together to the cloud provider and the enterprise network.

Ultimately, a unified data analytics platform addresses these challenges by helping organisations bring all their users and data together in an open, simple, scalable and secure service that extends the entire data lifecycle and can leverage the native capabilities of multiple clouds.

 

By John Kreyling, Managing Director, Centiel UK.
By David de Santiago, Group AI & Digital Services Director at OCS.
By Krishna Sai, Senior VP of Technology and Engineering.
By Danny Lopez, CEO of Glasswall.
By Oz Olivo, VP, Product Management at Inrupt.
By Jason Beckett, Head of Technical Sales, Hitachi Vantara.