Welcome to the Data Warehouse & Analytics Project repository! 🚀
This project showcases a complete end-to-end data warehousing and analytics solution. It covers the entire workflow—from designing and building a robust data warehouse to performing advanced analytics and generating actionable business insights. Developed as a portfolio project, it demonstrates best practices in data engineering, data modeling, and analytics, reflecting real-world industry standards.
The data architecture for this project follows the Medallion Architecture with three layers: Bronze, Silver, and Gold.
-
Bronze Layer:
Stores raw, unprocessed data directly from source systems. In this project, data is ingested from CSV files into a SQL Server database. -
Silver Layer:
Performs data cleansing, standardization, and normalization to prepare the data for analysis. -
Gold Layer:
Contains business-ready data structured in a star schema, optimized for reporting and advanced analytics.
A detailed project workflow and step-by-step guide are available in workflow.pdf.
This project demonstrates a complete data warehousing and analytics workflow, including:
- Data Architecture: Designing a modern data warehouse using the Medallion Architecture with Bronze, Silver, and Gold layers.
- ETL Pipelines: Extracting, transforming, and loading data from source systems into the data warehouse.
- Data Modeling: Creating fact and dimension tables optimized for analytical queries.
- Analytics & Reporting: Developing SQL-based reports and dashboards to generate actionable business insights.
🎯 This repository serves as a valuable resource for professionals and students aiming to showcase expertise in:
- SQL Development
- Data Architecture
- Data Engineering
- ETL Pipeline Development
- Data Modeling
- Data Analytics
Everything is for Free!
- Datasets: Access to the project dataset (csv files).
- SQL Server Express: Lightweight server for hosting your SQL database.
- SQL Server Management Studio (SSMS): GUI for managing and interacting with databases.
- Git Repository: Set up a GitHub account and repository to manage, version, and collaborate on your code efficiently.
- Whimsical: Design data architecture, models, flows, and diagrams.
- Notion: All-in-one tool for project management and organization.
- Notion Project Steps: Access to All Project Phases and Tasks.
Develop a modern data warehouse using SQL Server to consolidate sales data, enabling analytical reporting and data-driven decision-making.
- Data Sources: Import data from two source systems (ERP and CRM) provided as CSV files.
- Data Quality: Perform data cleansing to address inconsistencies and ensure reliability for analysis.
- Integration: Merge both sources into a unified, user-friendly data model optimized for analytical queries.
- Scope: Focus on the latest dataset; historical data tracking is not required.
- Documentation: Deliver clear documentation of the data model to support both business stakeholders and analytics teams.
Develop SQL-based analytics to provide actionable insights into:
- Customer Behavior
- Product Performance
- Sales Trends
These insights enable stakeholders to track key business metrics and make informed, strategic decisions.
data-warehouse-project/
│
├── datasets/ # Raw datasets used (ERP and CRM data)
│
├── docs/ # Project documentation and architecture details
│ ├── data_architecture.png # Whimsical file shows the project's architecture
│ ├── data_catalog.md # Catalog of datasets, including field descriptions and metadata
│ ├── data_flow.png # Whimsical file for the data flow diagram
│ ├── data_models.dpng # Whimsical file for data models (star schema)
│ ├── naming-conventions.md # Consistent naming guidelines for tables, columns, and files
| ├── workflow.pdf # Notion template workflow to map progress of project
│
├── scripts/ # SQL scripts for ETL and transformations
│ ├── bronze/ # Scripts for extracting and loading raw data
│ ├── silver/ # Scripts for cleaning and transforming data
│ ├── gold/ # Scripts for creating analytical models
│
├── tests/ # Test scripts and quality files
│
├── README.md # Project overview and instructions
├── LICENSE # License information for the repository
This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.
Hi, I’m Divyang Palshetkar. I have hands-on experience in designing and implementing modern data warehousing solutions, with expertise in ETL development, data modeling, SQL analytics, and creating actionable business insights from complex datasets.
I enjoy transforming raw data into meaningful, decision-driving insights and continuously exploring new technologies and best practices in data engineering and business intelligence.
