Powering real-time analytics

GridGain Systems has launched the GridGain Data Lake Accelerator, an in-memory solution for digital businesses that need to enrich operational data with historical data stored in data lakes to improve real-time analytics and decision automation. The GridGain Data Lake Accelerator is available for use with the GridGain Enterprise Edition and GridGain Ultimate Edition.

Wednesday, 12th June 2019 Posted 7 years ago in Big Data/Analytics-as-a-Service Data Analytics Infrastructure by Phil Alsop

The GridGain Data Lake Accelerator boosts data lake access by providing bi-directional integration with Apache™ Hadoop^®. This integration brings the historical data into the same in-memory computing layer as the operational data, enabling real-time analytics and computing on the combined data to drive real-time business processes. It leverages the GridGain Unified API and native Apache Spark™ connector to power real-time HTAP (hybrid transactional/analytical processing) in which transactions and analytics are performed on the same operational dataset.

“Many of today’s digital transformation and IoT use cases require real-time analytics against a combination of data lake and operational data,” said Abe Kleinfeld, president and CEO of GridGain. “The GridGain Data Lake Accelerator addresses the requirements of today’s businesses to gain instant insight, capitalize on opportunities as they arise and automate decision making.”

“Many companies have created Hadoop-based data lakes with a view to consolidating data from multiple data sources and serving the processing and analytics needs of multiple use-cases, but have then struggled to generate the expected value,” said Matt Aslett, Research VP, Data, AI and Analytics, 451 Research. “By bringing its in-memory compute functionality to the data lake, GridGain is providing an option for accelerating access to historical and live data to support real-time decision-making.”

Typical use cases for the GridGain Data Lake Accelerator include using historical data to enrich real-time data streams, calculating thresholds for real-time operational triggers from historical trends, and displaying historical and real-time data together in operational dashboards. For example, a transportation company might be collecting a continuous stream of data from its vehicle engines. The data is ingested, processed and analyzed and then stored in a data lake, with only the most recent data retained in the operational data store. When an anomalous reading in the live data triggers an alert for a particular engine, the system needs to analyze the engine data to identify the root cause of the problem. An infrastructure powered by GridGain’s in-memory computing platform, Kafka, Spark and Hadoop makes this possible. Apache Kafka feeds the live streaming data to the GridGain in-memory computing platform and to the Hadoop data lake. Spark retrieves the required data from the data lake and delivers it to the in-memory computing platform. The GridGain in-memory computing platform maintains the combined data set in memory and runs real-time queries across the data set. The result is deep and immediate insight into the causes of the anomalous reading.

Powering real-time analytics

Data strategy - the bridge to the post-pandemic economy

75% of surveyed CIOs struggle to unlock data insights within their organisation

Midlands Partnership NHS Foundation Trust selects Agilisys for business intelligence transformation

Data Decision Gap risks holding back economic recovery

Teradata forms global strategic collaboration with AWS

Half of enterprises cannot trust their CRM data for a single source of truth on customers

Online Forex broker Axi Moves to KX Insights

Sport Ireland packs winning punches with SAS