Spark: PySpark

Learn PySpark from basics in this free online tutorial. PySpark is taught hands-on by experts. Gain skills to work with Spark MLlib, RDD, data frames, and clustering with case studies for structured and semi-structured data.

4.53
average rating

Ratings

Beginner

Level

3.75 Hrs

Learning hours

12.9K+
local_fire_department

Learners

Skills you’ll Learn

About this Course

The PySpark course begins by giving you an introduction to PySpark and will further discuss examples to explain it. Moving further, you will gain expertise working with Spark libraries, like MLlib. Next, in this PySpark tutorial, you will learn to move RDD to Dataframe API and become familiar with Clustering in PySpark. The course also comprehends a case study to help you gain hands-on on the learned topics.

 

Adding value to your learning experience, the Introduction to PySpark course is taught by an industry expert. A quiz is assigned to test your gains at the end of the course. Complete the quiz and gain a course completion certificate.  

 

To expand your learning in the Data Science domain, consider pursuing Data Science certificate courses that offer specialization/electives to escalate your career. 

Why upskill with us?

check circle outline
700+ free courses
In-demand skills & tools
access time
Free life time Access

Course Outline

PySpark Introduction with an Example

This section gives a clear overview of how Spark contributes to Hadoop, and the Spark framework. It explains PySpark with examples and code demonstrations. 
 

Spark MLIB

This section discusses the Machine Learning library supported by Spark. It then explains ML pipelines, Transformers, Estimator, and architecture. You will also gain an understanding of K-means and Tf-ldf through hands-on code demonstrations. 

Moving from RDD to dataframe API

You will understand Spark dataframes, and  SQL. You will gain enough experience to understand why you need to shift from RDD to dataframe API while working with Data Science and Big Data tasks through demonstrated code samples. 

Clustering with PySpark

This section will explain k-means clustering in MLlib and TFID, most commonly used in neural networks, with demonstrated code. 
 

Music Data Case Studies

This section demonstrates a case study on the Music dataset to understand the aforementioned topics with hands-on experience. 
 

Trusted by 10 Million+ Learners globally

What our learners say about the course

Find out how our platform helped our learners to upskill in their career.

4.53
Course Rating
72%
19%
6%
2%
1%

What our learners enjoyed the most

Spark: PySpark

3.75 Learning Hours . Beginner

Why upskill with us?

check circle outline
700+ free courses
In-demand skills & tools
access time
Free life time Access