A comprehensive end-to-end automated BI pipeline processing 800K+ retail transactions across customers, products, and regions. Built with Python for data generation and EDA, MySQL for scalable database architecture, and Power BI for interactive multi-page dashboards โ€” combining data engineering, business intelligence, and advanced analytics.

๐Ÿ› ๏ธ Tools & Technologies Used

  • Python (Pandas, NumPy, Matplotlib, Seaborn) โ€“ Generated synthetic data, conducted comprehensive EDA, and built automation workflows.
  • MySQL โ€“ Designed star schema database with 8 tables and 11 analytical views for scalable data modeling.
  • Power BI & DAX โ€“ Created 4 multi-page interactive dashboards with custom measures for time intelligence and YoY tracking.
  • Generative AI (ChatGPT) โ€“ Used for workflow optimization, debugging SQL queries, DAX troubleshooting, and documentation enhancement.

๐Ÿ“ Dataset Description

This project uses synthetic retail data designed to simulate real-world business scenarios:

๐Ÿ”น 777,288 sales transactions spanning 3 years (2022-2024)
๐Ÿ”น 50,000 customers with segments (Budget, Standard, Premium)
๐Ÿ”น 1,000 products across 5 categories (Electronics, Clothing, Home & Garden, Sports, Toys)
๐Ÿ”น 5 geographic regions with 50+ cities

All datasets were generated using Python to replicate realistic sales patterns including seasonality, customer behavior, and regional variations.

๐ŸŽฏ Problem Statements / Goals

This project goes beyond basic dashboards โ€” itโ€™s about building an intelligent monitoring system that answers:

  • ๐Ÿ“Š How do we track executive KPIs (revenue, customers, transactions) with YoY comparisons?
  • ๐Ÿšจ Can we detect anomalies in customer behavior and product performance automatically?
  • ๐Ÿ‘ฅ Who are our high-value customer segments and whatโ€™s their lifetime value?
  • ๐Ÿ—บ๏ธ Which regions and products drive the most growth?
  • โš™๏ธ How do we build automated workflows that refresh data and dashboards daily?
  • ๐ŸŽฏ Can we reduce false alerts by 85% through adaptive thresholds?

๐Ÿ”„ Project Workflow

Phase 1-4: Python (Data Generation & EDA)

Data Generation:

  • Created 800K+ realistic transactions with temporal patterns, seasonality, and customer segmentation.
  • Simulated discount strategies, product lifecycles, and regional variations.

Data Cleaning & Validation:

  • Achieved 100% quality score through automated validation checks.
  • Handled missing values, outliers, and data type standardization.

Exploratory Data Analysis:

  • Temporal Analysis: Identified Nov-Dec-Jan peak months (140-160% above average).
  • Geographic Intelligence: Mapped regional performance and growth trends.
  • Customer Segmentation: RFM analysis uncovering 20% of customers driving 32% of revenue.
  • Product Portfolio: Category-level performance tracking and anomaly detection.

Phase 5: MySQL Database Engineering

Star Schema Architecture:

  • Fact Table: sales_transactions (777K+ records)
  • Dimension Tables: dim_customers, dim_products, dim_dates
  • 11 Analytical Views: Pre-aggregated KPIs for dashboard performance

Key Innovation โ€“ Adaptive Anomaly Detection:

  • Built percentile-based threshold system that adjusts dynamically.
  • Reduced false alerts from 10,049 to 1,524 (85% improvement).
  • Tracks customer behavioral anomalies, product performance drops, and revenue spikes.

Phase 6: Power BI Dashboards

Built 4 specialized dashboards with 34 interactive visuals:

๐Ÿ“„ Dashboard 1: Executive Overview

  • ๐Ÿ“Œ KPI cards with YoY growth indicators (โ–ฒ/โ–ผ)
  • ๐Ÿ“ˆ Revenue trends by month, quarter, year
  • ๐ŸŽฏ Top products and categories by performance
  • ๐Ÿ”„ Dynamic filters for time-based analysis

Executive Dashboard

๐Ÿ“„ Dashboard 2: Anomaly & Risk Monitor

  • ๐Ÿšจ Customer behavioral anomalies
  • ๐Ÿ“‰ Product performance alerts
  • โš ๏ธ Revenue spike/drop detection
  • ๐ŸŽฏ ML-ready framework for future integration

Anomaly Monitor

๐Ÿ“„ Dashboard 3: Customer Intelligence

  • ๐Ÿ‘ฅ Customer lifetime value (CLV) analysis
  • ๐Ÿ“Š RFM segmentation (Recency, Frequency, Monetary)
  • ๐Ÿ† High-value customer identification
  • ๐Ÿ“ˆ Cohort analysis and retention tracking

Customer Intelligence

๐Ÿ“„ Dashboard 4: Geographic & Product Performance

  • ๐Ÿ—บ๏ธ Regional performance heatmap
  • ๐Ÿ“ฆ Product portfolio matrix
  • ๐ŸŒ City-level revenue breakdown
  • ๐Ÿ“Š Category growth indicators

Geographic Performance

Advanced DAX Techniques:

  • Time intelligence functions (SAMEPERIODLASTYEAR, TOTALYTD)
  • HASONEVALUE filter awareness for โ€œAllโ€ selections
  • Conditional formatting with hex color codes
  • Dynamic YoY growth calculations with unicode arrows

๐Ÿ’ก Key Insights & Achievements

  • ๐Ÿ“Š 800K+ transactions processed through automated pipeline
  • ๐Ÿ—„๏ธ MySQL star schema with 8 tables optimized for BI queries
  • ๐Ÿ” 85% reduction in false alerts through adaptive anomaly detection
  • ๐Ÿ“ˆ Seasonality patterns identified: Nov-Dec-Jan peak at 140-160% above average
  • ๐Ÿ‘ฅ Customer segmentation: 20% high-value customers drive 32% of revenue
  • ๐ŸŒ Regional insights: 5 regions analyzed with city-level granularity
  • โš™๏ธ Automation-ready: Modular design supports scheduled workflows

๐Ÿš€ Things I Learned

  • How to design and implement a star schema database for BI workloads.
  • Building adaptive anomaly detection using statistical thresholds instead of ML.
  • Advanced DAX patterns including time intelligence and filter context handling.
  • Structuring projects for automation with modular, reusable code.
  • Database optimization through views and indexed relationships.
  • Creating multi-page dashboards that tell a complete business story.
  • Using AI tools to accelerate debugging, documentation, and design decisions.

๐Ÿ“ฆ How to Explore This Project

  1. ๐Ÿ“ฅ Clone the GitHub repository.
  2. ๐Ÿ Run the Jupyter notebooks to see data generation and EDA process.
  3. ๐Ÿ—„๏ธ Import SQL scripts into MySQL to build the database.
  4. ๐Ÿ“Š Open the Power BI .pbix file and connect to your MySQL instance.
  5. ๐Ÿงญ Use dashboard filters and slicers to explore interactive insights.

๐Ÿ”ฎ Future Enhancements (Phases 7-9)

  • ๐Ÿค– ML Anomaly Detection: Customer churn prediction and revenue forecasting models.
  • ๐Ÿ’ฌ AI Text Insights: Auto-generated executive summaries using OpenAI API.
  • โš™๏ธ Full Automation: Daily pipeline running at 6 AM with email alerts.

๐Ÿ™ THANK YOU