Beam Summit 2021 - Fault Tolerant Integration of Apache Beam With Relational Database
In this session, Savitha and Piaw share a case at Niantic Labs where they Postgres as a time-series database to store metrics information from Apache Beam workflows.
While Prometheus is an industry-wide used monitoring system and time series database, it cannot be used as a metrics reporting system for outputs of a non-real-time Apache Beam pipeline as the metric denotes a player activity when occurred in the past, and Prometheus does not have a feature to accept timestamp as input. Niantic's goal was to process the batched data pipelines and produce timestamped player activity metrics without data loss and millisecond accuracy. They solved the problem using a novel Apache Beam pipeline design pattern that allows workflow pipelines to do most of the data processing, allowing metrics ingestion into Postgres at low cost, low latency, and large scale, along with optimal execution time. This customized implementation is another approach of divide-and-conquer applied to distributed data processing.
This talk was presented at Beam Summit 2021 by Savitha Jayasankar & Piaw Na.