Machine Learning with Apache Spark 3.0 using Scala
1 day ago
Development
[100% OFF] Machine Learning with Apache Spark 3.0 using Scala

Machine Learning with Apache Spark 3.0 using Scala with Examples and 4 Projects

4.3
16,186 students
8h total length
English
$0$49.99
100% OFF

Course Description

Do you want to master Machine Learning at scale using one of the most powerful Big Data frameworks in the world? This course will teach you Machine Learning with Apache Spark 3.0 and Scala, step by step, through real-world projects and hands-on coding examples.


Apache Spark is the industry-standard framework for processing and analyzing large datasets. Its MLlib (Machine Learning Library) provides scalable implementations of machine learning algorithms, making it possible to train, evaluate, and deploy models on massive amounts of data efficiently. Combined with Scala, the native language of Spark, you’ll learn how to build and optimize end-to-end machine learning pipelines.


This course is designed for beginners to intermediate learners who want to get practical experience in applying machine learning techniques in Spark. You’ll start with Big Data and Spark basics, then move on to core machine learning concepts, and finally apply them to real-world datasets through hands-on projects like rain prediction, ad click prediction, iris flower classification, and customer segmentation.


By the end of this course, you will have the skills and confidence to build scalable machine learning models using Spark 3.0 and Scala—skills that are highly in-demand in industries such as finance, e-commerce, telecom, and technology.


What You Will Learn


  • Introduction to Machine Learning & Spark MLlib

    • Basics of machine learning, types (supervised, unsupervised, classification, regression, clustering).

    • What is Spark ML? How Spark MLlib simplifies building ML models at scale.


  • Apache Spark Basics (Optional Section)

    • Get familiar with Spark fundamentals: RDD, DataFrames, and Datasets.

    • Set up Spark environment using Databricks.

    • Learn notebook basics, cluster provisioning, and working with Scala.


  • Data Handling & Preparation

    • Work with different data sources: CSV, JSON, LIBSVM, Images, Avro, and Parquet.

    • Understand the Machine Learning data pipeline in Spark.

    • Practice feature extraction, transformation, and selection techniques.


  • Feature Engineering in Spark ML

    • Learn popular feature extractors like TF-IDF, Word2Vec, CountVectorizer, FeatureHasher.

    • Apply transformers such as Tokenizer, StopWordsRemover, n-gram, PCA, StringIndexer, OneHotEncoder.

    • Use feature selectors like RFormula and ChiSqSelector.

    • Build and connect them into end-to-end ML pipelines.


  • Machine Learning Models with Spark

    • Classification Models: Decision Trees, Logistic Regression, Naive Bayes (Iris Prediction), Random Forest, Gradient-Boosted Trees, Linear SVM, One-vs-Rest.

    • Regression Models: Linear Regression, Decision Tree Regression, Random Forest Regression, Gradient-Boosted Tree Regression, Predict Ads Clicks project.

    • Clustering: KMeans (Customer Segmentation Project).


  • Hands-On Projects

    • Rain Prediction in Australia (complete ML pipeline).

    • Iris Flower Classification using Naive Bayes.

    • Customer Segmentation using KMeans.

    • Ad Click Prediction using Linear Regression.

    • Multiple other classification and regression use cases with step-by-step Scala implementations.


  • Spark MLlib in Practice

    • Understand how to train, evaluate, and optimize ML models at scale.

    • Explore key concepts like shuffling, correlation, pipeline components, and evaluation metrics.

Introduction to Machine Learning & Spark MLlib

  • Basics of machine learning, types (supervised, unsupervised, classification, regression, clustering).

  • What is Spark ML? How Spark MLlib simplifies building ML models at scale.


Basics of machine learning, types (supervised, unsupervised, classification, regression, clustering).

What is Spark ML? How Spark MLlib simplifies building ML models at scale.


Apache Spark Basics (Optional Section)

  • Get familiar with Spark fundamentals: RDD, DataFrames, and Datasets.

  • Set up Spark environment using Databricks.

  • Learn notebook basics, cluster provisioning, and working with Scala.


Get familiar with Spark fundamentals: RDD, DataFrames, and Datasets.

Set up Spark environment using Databricks.

Learn notebook basics, cluster provisioning, and working with Scala.


Data Handling & Preparation

  • Work with different data sources: CSV, JSON, LIBSVM, Images, Avro, and Parquet.

  • Understand the Machine Learning data pipeline in Spark.

  • Practice feature extraction, transformation, and selection techniques.


Work with different data sources: CSV, JSON, LIBSVM, Images, Avro, and Parquet.

Understand the Machine Learning data pipeline in Spark.

Practice feature extraction, transformation, and selection techniques.


Feature Engineering in Spark ML

  • Learn popular feature extractors like TF-IDF, Word2Vec, CountVectorizer, FeatureHasher.

  • Apply transformers such as Tokenizer, StopWordsRemover, n-gram, PCA, StringIndexer, OneHotEncoder.

  • Use feature selectors like RFormula and ChiSqSelector.

  • Build and connect them into end-to-end ML pipelines.


Learn popular feature extractors like TF-IDF, Word2Vec, CountVectorizer, FeatureHasher.

Apply transformers such as Tokenizer, StopWordsRemover, n-gram, PCA, StringIndexer, OneHotEncoder.

Use feature selectors like RFormula and ChiSqSelector.

Build and connect them into end-to-end ML pipelines.


Machine Learning Models with Spark

  • Classification Models: Decision Trees, Logistic Regression, Naive Bayes (Iris Prediction), Random Forest, Gradient-Boosted Trees, Linear SVM, One-vs-Rest.

  • Regression Models: Linear Regression, Decision Tree Regression, Random Forest Regression, Gradient-Boosted Tree Regression, Predict Ads Clicks project.

  • Clustering: KMeans (Customer Segmentation Project).


Classification Models: Decision Trees, Logistic Regression, Naive Bayes (Iris Prediction), Random Forest, Gradient-Boosted Trees, Linear SVM, One-vs-Rest.

Regression Models: Linear Regression, Decision Tree Regression, Random Forest Regression, Gradient-Boosted Tree Regression, Predict Ads Clicks project.

Clustering: KMeans (Customer Segmentation Project).


Hands-On Projects

  • Rain Prediction in Australia (complete ML pipeline).

  • Iris Flower Classification using Naive Bayes.

  • Customer Segmentation using KMeans.

  • Ad Click Prediction using Linear Regression.

  • Multiple other classification and regression use cases with step-by-step Scala implementations.


Rain Prediction in Australia (complete ML pipeline).

Iris Flower Classification using Naive Bayes.

Customer Segmentation using KMeans.

Ad Click Prediction using Linear Regression.

Multiple other classification and regression use cases with step-by-step Scala implementations.


Spark MLlib in Practice

  • Understand how to train, evaluate, and optimize ML models at scale.

  • Explore key concepts like shuffling, correlation, pipeline components, and evaluation metrics.

Understand how to train, evaluate, and optimize ML models at scale.

Explore key concepts like shuffling, correlation, pipeline components, and evaluation metrics.

Similar Courses