This Course Cover Complete Big Data Engineering Topics
▶️Part 1 -
https://youtu.be/Tyg1FVNq40g
▶️Part 2 -
https://youtu.be/k1LaWFNOa68
Resources
=========
Hadoop Installation Steps - https://github.com/atozknowledge/bigdata/wiki/Hadoop-Single-Node-Installation
Hadoop Multi Node Cluster Setup Installation Steps - https://bit.ly/3LRwgRi
Big Data Integration Book - https://bit.ly/3ipIlBx
Hive
Hive-site.xml - https://github.com/Gowthamsb12/hive/blob/main/hive-site.xml
Hive ACID commands - https://bit.ly/2V9W1qT
Apache Hive ORC vs TextFile Format - https://bit.ly/3cbIbNl
Hive UDF Code - https://codewithgowtham.blogspot.com/2021/09/hive-udf.html
Spark
Spark Submit Cluster [YARN] Mode Code link - https://github.com/atozknowledge/bigdata
Spark Kafka Cassandra | End to End Streaming Project Code and Steps - https://bit.ly/3LqXXRC
Kafka Installation Video -
https://youtu.be/XCOIp-CqGkg
Sqoop Commands - https://codewithgowtham.blogspot.com/2021/03/sqoop-commands.html
Course Outline
00:00 Unboxing Spark RDD
11:03 Different Ways to Create [Spark RDD]
14:48 Spark Transformation Types and Actions
27:58 Spark Executor Core & Memory Explained
36:03 Spark [Executor & Driver] Memory Calculation
42:12 Spark Submit Cluster [YARN] Mode
55:03 Spark Lens Integration [Spark Application]
01:05:03 Spark [Hash Partition] Explained
01:22:01 Spark [Custom Partition] Implementation
01:28:36 Spark With JDBC (MYSQL / ORACLE)
01:42:02 Apache Spark UDF
01:50:23 Spark - Repartition Or Coalesce
02:00:08 Apache Spark SQL With Apache Hive
02:08:24 Spark reduceByKey Or groupByKey
02:20:12 Spark Dedup With Use Cases
02:28:59 Is PySpark UDF is Slow Why
02:34:31 Spark - Calculate Moving Average
02:43:02 Spark Market Basket Algorithm
02:50:56 PySpark Calculate Frequency Table
02:55:10 PySpark Types of Drop Null Records
02:59:46 Print Spark Data Lineage
03:04:50 Spark Logical & Physical Plan
03:13:01 Hadoop Multi Node Cluster Setup
03:42:20 Spark Kafka Cassandra End to End Streaming Project
04:12:29 Big Data - On Premise Or Cloud
04:15:07 Do Data Engineers Need Cloud Computing
04:19:12 Big Data On Cloud AWS EMR
04:36:55 Google Dataproc BigData Managed Service
04:52:22 Google Dataproc and BigQuey Project
05:03:43 Apache SQOOP Data Migration POC
05:29:26 Top 10 Points to Explain Big Data Project In Interview
05:42:08 Top 10 Interview Q & A Explained
05:58:58 Big Data Engineering Resume Preparation
06:21:04 Big Data Project Explained
06:40:58 4 Ways For Data Engineers To Make Extra Income
06:45:45 Big Data Project Deployment Explained
06:58:25 How to get Latest Data Engineering News
𝐒𝐨𝐜𝐢𝐚𝐥𝐬
🎥𝐘𝐨𝐮𝐓𝐮𝐛𝐞 - https://www.youtube.com/@thedatatech
📸𝐈𝐧𝐬𝐭𝐚𝐠𝐫𝐚𝐦 - https://instagram.com/thedatatech.in
💼𝐋𝐢𝐧𝐤𝐞𝐝𝐈𝐧 - https://www.linkedin.com/in/sbgowtham/
🌐𝐖𝐞𝐛𝐬𝐢𝐭𝐞 - https://codewithgowtham.blogspot.com
💻𝐆𝐢𝐭𝐇𝐮𝐛 - http://github.com/Gowthamdataengineer
💬𝐖𝐡𝐚𝐭𝐬 𝐀𝐩𝐩 - https://lnkd.in/g5JrHw8q
📧𝐄𝐦𝐚𝐢𝐥 -
[email protected]
📱𝐀𝐥𝐥 𝐌𝐲 𝐒𝐨𝐜𝐢𝐚𝐥𝐬 - https://lnkd.in/gf8k3aCH
Hash Tags
#bigdata #dataengineering #DataEngineering #BigData #DataPipeline #ETL #DataProcessing #DataScience #DataAnalytics #DataWrangling #DataOps #DataArchitecture #DataIntegration #DataTransformation #DataStorage #DataManagement #DataPlatform #CloudDataEngineering #AWS #Azure #GCP #DataCloud #CloudComputing #CloudDataPipeline #DataStreaming #Kafka #Spark #Hadoop #NoSQL #DataModeling #DataGovernance #DataLake #DataWarehouse #Redshift #BigQuery #Snowflake #DataVisualization #MachineLearning #AI #APIs #DatabaseManagement #ServerlessComputing #DataMigration #DevOps #MLOps #DataOrchestration #DataAutomation #DataSecurity #CloudMigration #DataEngineeringCommunity #RealTimeData #DataMonitoring #DataEngineeringTools #DataInsights #DataDriven #DataQuality #DataEngineeringProjects #PythonForData #SQL #DataPipelinesSimplified #CloudETL #ModernDataStack #CloudDataOps #DataLakehouse #AnalyticsEngineering #DataFlow #CloudIntegration #DataTools #DataPipelineAutomation #DataModelingSimplified #ETLTools #DataProcessingPipeline #DataCloudExperts #ServerlessData #CloudComputingSolutions #BigDataAnalytics #AdvancedAnalytics #DataInnovation #CloudDataManagement #DataOpsFramework #ETLProcesses #StreamingDataPipeline #DataScienceWorkflow #CloudEngineering #DataEngineerLife #DataEngineerJobs #DataEngineeringForBeginners #CloudSolutions #TechForData #DataScienceCommunity #CloudFirst #DataStorageOptimization #CloudETLTools #DataProcessingFrameworks #RealTimeAnalytics.