The Complete GCP Data Engineering Project - Retailer Domain
13 hours ago
IT & Software
[100% OFF] The Complete GCP Data Engineering Project - Retailer Domain

Industry Standard Project in Retailer Domain using GCP services like GCS, BigQuery, Dataproc, Composer, GitHub, CICD

4.7
178 students
6h total length
English
$0$44.99
100% OFF

Course Description

  • This project focuses on building a data lake in Google Cloud Platform (GCP) for Retailer Domain

  • The goal is to centralize, clean, and transform data from multiple sources, enabling Retailers providers and insurance companies to streamline billing, claims processing, and revenue tracking.

  • GCP Services Used:

    • Google Cloud Storage (GCS): Stores raw and processed data files.

    • BigQuery: Serves as the analytical engine for storing and querying structured data.

    • Dataproc: Used for large-scale data processing with Apache Spark.

    • Cloud Composer (Apache Airflow): Automates ETL pipelines and workflow orchestration.

    • Cloud SQL (MySQL): Stores transactional Electronic Medical Records (EMR) data.

    • GitHub & Cloud Build: Enables version control and CI/CD implementation.

    • CICD (Continuous Integration & Continuous Deployment): Automates deployment pipelines for data processing and ETL workflows.

This project focuses on building a data lake in Google Cloud Platform (GCP) for Retailer Domain

The goal is to centralize, clean, and transform data from multiple sources, enabling Retailers providers and insurance companies to streamline billing, claims processing, and revenue tracking.

GCP Services Used:

  • Google Cloud Storage (GCS): Stores raw and processed data files.

  • BigQuery: Serves as the analytical engine for storing and querying structured data.

  • Dataproc: Used for large-scale data processing with Apache Spark.

  • Cloud Composer (Apache Airflow): Automates ETL pipelines and workflow orchestration.

  • Cloud SQL (MySQL): Stores transactional Electronic Medical Records (EMR) data.

  • GitHub & Cloud Build: Enables version control and CI/CD implementation.

  • CICD (Continuous Integration & Continuous Deployment): Automates deployment pipelines for data processing and ETL workflows.

Google Cloud Storage (GCS): Stores raw and processed data files.

BigQuery: Serves as the analytical engine for storing and querying structured data.

Dataproc: Used for large-scale data processing with Apache Spark.

Cloud Composer (Apache Airflow): Automates ETL pipelines and workflow orchestration.

Cloud SQL (MySQL): Stores transactional Electronic Medical Records (EMR) data.

GitHub & Cloud Build: Enables version control and CI/CD implementation.

CICD (Continuous Integration & Continuous Deployment): Automates deployment pipelines for data processing and ETL workflows.

  • Techniques involved :

    • Metadata Driven Approach

    • SCD type 2 implementation

    • CDM(Common Data Model)

    • Medallion Architecture

    • Logging and Monitoring

    • Error Handling

    • Optimizations

    • CICD implementation

    • many more best practices

Techniques involved :

  • Metadata Driven Approach

  • SCD type 2 implementation

  • CDM(Common Data Model)

  • Medallion Architecture

  • Logging and Monitoring

  • Error Handling

  • Optimizations

  • CICD implementation

  • many more best practices

Metadata Driven Approach

SCD type 2 implementation

CDM(Common Data Model)

Medallion Architecture

Logging and Monitoring

Error Handling

Optimizations

CICD implementation

many more best practices

  • Data Sources

    • MySQL Retailer Database

    • MySQL Supplier Database

    • API Reviews (api-reviews)


  • Expected Outcomes

    • Efficient Data Pipeline: Automating the ingestion and transformation of RCM data.

    • Structured Data Warehouse: gold tables in BigQuery for analytical queries.

    • After Analysis, Looker BI is used to generate dashboards and reports based on gold-layer tables.

    • All processes (data extraction, loading into GCS, transformation in BigQuery) are managed using Apache Airflow, ensuring automation, scheduling, and monitoring.


Data Sources

  • MySQL Retailer Database

  • MySQL Supplier Database

  • API Reviews (api-reviews)


MySQL Retailer Database

MySQL Supplier Database

API Reviews (api-reviews)


Expected Outcomes

  • Efficient Data Pipeline: Automating the ingestion and transformation of RCM data.

  • Structured Data Warehouse: gold tables in BigQuery for analytical queries.

  • After Analysis, Looker BI is used to generate dashboards and reports based on gold-layer tables.

  • All processes (data extraction, loading into GCS, transformation in BigQuery) are managed using Apache Airflow, ensuring automation, scheduling, and monitoring.


Efficient Data Pipeline: Automating the ingestion and transformation of RCM data.

Structured Data Warehouse: gold tables in BigQuery for analytical queries.

After Analysis, Looker BI is used to generate dashboards and reports based on gold-layer tables.

All processes (data extraction, loading into GCS, transformation in BigQuery) are managed using Apache Airflow, ensuring automation, scheduling, and monitoring.


Similar Courses