
SQL Data Cleaning Portfolio Project: Data Engineering, Analytics & Data Science with Business Rules, KPIs for dashboards
Course Description
This course is built to give you a publishable portfolio project as the end product — a complete SQL data-cleaning and KPI pipeline you can put on GitHub, link on LinkedIn, and confidently talk through in interviews.
It’s a real-world simulation built around one messy dataset and a business brief with a clear target: deliver ten KPIs that are trustworthy enough to go on a dashboard.
Most SQL “data cleaning” courses either stay at the level of syntax drills, or they use clean toy datasets where nothing breaks. That’s not what you face in real data teams.
In this course you’ll work through the same workflow you’d use on a real project:
Read the brief properly so you know what “correct” means
Explore the raw schema and spot the mess early (mixed date formats, typos in categories, missing values, duplicates)
Build a typed, safer silver layer where errors surface in a controlled way
Enforce the business rules and deduplicate into one trusted clean_table
Compute and standardise all KPI outputs into a consistent results table
Validate results, understand tolerances/rounding, and debug mismatches like a professional
Finish by turning the whole pipeline into a portfolio-ready GitHub project, with a clean repo structure, a strong README, and proof of results
Read the brief properly so you know what “correct” means
Explore the raw schema and spot the mess early (mixed date formats, typos in categories, missing values, duplicates)
Build a typed, safer silver layer where errors surface in a controlled way
Enforce the business rules and deduplicate into one trusted clean_table
Compute and standardise all KPI outputs into a consistent results table
Validate results, understand tolerances/rounding, and debug mismatches like a professional
Finish by turning the whole pipeline into a portfolio-ready GitHub project, with a clean repo structure, a strong README, and proof of results
Course outline (high level):
Section 00: Course Introduction
Section 01: The Verulam Blue Mint Environment
Section 02: Understanding the Challenge Brief
Section 03: Exploring Source Data Schema
Section 04: Data Cleaning I – Sampling & Completeness
Section 05: Data Cleaning II – Silver Layer & Normalisation
Section 06: Data Cleaning III – Business Rules & Deduplication
Section 07: Understanding the KPIs
Section 08: Computing KPIs
Section 09: Results
Section 10: Portfolio project deployment (repo + README + LinkedIn-style project story)
Section 00: Course Introduction
Section 01: The Verulam Blue Mint Environment
Section 02: Understanding the Challenge Brief
Section 03: Exploring Source Data Schema
Section 04: Data Cleaning I – Sampling & Completeness
Section 05: Data Cleaning II – Silver Layer & Normalisation
Section 06: Data Cleaning III – Business Rules & Deduplication
Section 07: Understanding the KPIs
Section 08: Computing KPIs
Section 09: Results
Section 10: Portfolio project deployment (repo + README + LinkedIn-style project story)
By the end, you won’t just know “how to clean data using SQL”. You’ll have an end-to-end portfolio project you can explain clearly: what was wrong with the data, what you changed, what rules you enforced, and why your KPIs can be trusted.
Similar Courses

Practice Exams | MS AB-100: Agentic AI Bus Sol Architect

Práctica para el exámen | Microsoft Azure AI-900
