Modern Spark DataFrame & Dataset | Apache Spark 2.0 Tutorial

Modern Spark DataFrame & Dataset | Apache Spark 2.0 Tutorial

68.866 Lượt nghe
Modern Spark DataFrame & Dataset | Apache Spark 2.0 Tutorial
Adam Breindel, lead Spark instructor at NewCircle, talks about which APIs to use for modern Spark with a series of brief technical explanations and demos that highlight best practices, latest APIs, and new features. (Topics Indexed Below) We'll look at how Dataset and DataFrame behave in Spark 2.0, Whole-Stage Code Generation, and go through a simple example of Spark 2.0 Structured Streaming (Streaming with DataFrames) that you can run in your own free instance of Databricks. 00:00:40 - Intro: What is "Modern Spark" 00:01:26 - DataFrame 00:05:07 - Why not use RDD? 00:09:15 - Intro to DataFrame and Dataset 00:10:13 - DataFrame versus Dataset 00:14:42 - Dataset Queries and Dataset with Scala classes 00:19:07 - Spark Query Optimizer 00:23:26 - Whole-Stage Codegen 00:27:21 - Hive integration 00:29:28 - Wrapping Up DataFrame/Dataset Benefits 00:30:54 - One More Thing - Structured Streaming 00:36:47 - Conclusion Try the Examples: + Databricks Community Edition: https://databricks.com/try + Get this Notebook: https://bit.ly/get-notebook ---------------------------------------------------------------------------------------------- SPARK 2.0 TRAINING | NewCircle | Onsite & Public Classes ---------------------------------------------------------------------------------------------- + Programming for Spark 2.0 (3 days): http://bit.ly/spark-prog-newcircle + Spark 2.0 for Machine Learning & Data Science (3 days): http://bit.ly/spark-ml-newcircle