Sources, Sinks, and Operators: A Performance Deep Dive

Sources, Sinks, and Operators: A Performance Deep Dive

7.320 Lượt nghe
Sources, Sinks, and Operators: A Performance Deep Dive
At Splunk we have built a Flink streaming infrastructure and scaled it to petabytes of data per day and millions of events per second. Along the way, we've learned a lot about writing performant operations in the DataStream API and putting together high-throughput pipelines. This talk will cover our real-life experiences in scaling for this throughput and the best practices we've learned along the way. We'll talk about aggregation, built-in functions, user-defined functions, the async I/O API, as well as discoveries around GC, Java object management, state backends, and serialization. 0:00 Introduction 1.07 Environmental & Performance Testing 3:57 The basic challenges 10:17 Sources 21:22 Sinks 25:10 Putting it all together