Schema-aware PCollections and Beam SQL (Beam Summit Europe 2019)
Apache Beam doesn’t have any knowledge of the actual structure of the records in a PCollection, and little understanding of PTransforms. In practice, most of the PCollections are schematized: Avro records, BigQuery rows, and even POJOs and case classes. Many operations are performed on structural records: filtering by field, grouping by a specific field, and so on.
In this talk, we are going to learn about schema-aware PCollections and Beam SQL. See how we can leverage them, and how it works with Scio, Scala DSL for Apache Beam.
Speaker:
Gleb Kanterov - Staff Engineer @ Spotify
The Beam Summit Europe 2019 was a 2 day event held in Berlin at the KulturBrauerei, all focused around Apache Beam.
For more information about the Beam Summit, follow us on twitter @BeamSummit or go to the website: https://beamsummit.org/