logo-img

Course Details

Data Engineering Essentials using SQL, Python, and PySpark

(5146 reviews)

Release Date

2021-02-14

Description

As part of this course, you will learn all the Data Engineering Essentials related to building Data Pipelines using SQL, Python as Hadoop, Hive, or Spark SQL as well as PySpark Data Frame APIs. You will also understand the development and deployment lifecycle of Python applications using Docker as well as PySpark on multinode clusters. You will also gain basic knowledge about reviewing Spark Jobs using Spark UI.

About Data Engineering

Data Engineering is nothing but processing the data depending on our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc.

Here are some of the challenges the learners have to face to learn key Data Engineering Skills such as Python, SQL, PySpark, etc.

  • Having an appropriate environment with Apache Hadoop, Apache Spark, Apache Hive, etc working together.

  • Good quality content with proper support.

  • Enough tasks and exercises for practice

This course is designed to address these key challenges for professionals at all levels to acquire the required Data Engineering Skills (Python, SQL, and Apache Spark).

  • Setup Environment to learn Data Engineering Essentials such as SQL (using Postgres), Python, etc.

  • Setup required tables in Postgres to practice SQL

  • Writing basic SQL Queries with practical examples using WHERE, JOIN, GROUP BY, HAVING, ORDER BY, etc

  • Advanced SQL Queries with practical examples such as cumulative aggregations, ranking, etc

  • Scenarios covering troubleshooting and debugging related to Databases.

  • Performance Tuning of SQL Queries

  • Exercises and Solutions for SQL Queries.

  • Basics of Programming using Python as Programming Language

  • Python Collections for Data Engineering

  • Data Processing or Data Engineering using Pandas

  • 2 Real Time Python Projects with explanations (File Format Converter and Database Loader)

  • Scenarios covering troubleshooting and debugging in Python Applications

  • Performance Tuning Scenarios related to Data Engineering Applications using Python

  • Getting Started with Google Cloud Platform to setup Spark Environment using Databricks

  • Writing Basic Spark SQL Queries with practical examples using WHERE, JOIN, GROUP BY, HAVING, ORDER BY, etc

  • Creating Delta Tables in Spark SQL along with CRUD Operations such as INSERT, UPDATE, DELETE, MERGE, etc

  • Advanced Spark SQL Queries with practical examples such as ranking

  • Integration of Spark SQL and Pyspark

  • In-depth coverage of Apache Spark Catalyst Optimizer for Performance Tuning

  • Reading Explain Plans of Spark SQL Queries or Pyspark Data Frame APIs

  • In-depth coverage of columnar file formats and Performance tuning using Partitioning

image not found
₹3,099
Buy
image not found

Related Courses

image not found