๐ Sales Health Monitor โ Automated BI Pipeline
A comprehensive end-to-end automated BI pipeline processing 800K+ retail transactions across customers, products, and regions. Built with Python for data generation and EDA, MySQL for scalable database architecture, and Power BI for interactive multi-page dashboards โ combining data engineering, business intelligence, and advanced analytics.
๐ ๏ธ Tools & Technologies Used
- Python (Pandas, NumPy, Matplotlib, Seaborn) โ Generated synthetic data, conducted comprehensive EDA, and built automation workflows.
- MySQL โ Designed star schema database with 8 tables and 11 analytical views for scalable data modeling.
- Power BI & DAX โ Created 4 multi-page interactive dashboards with custom measures for time intelligence and YoY tracking.
- Generative AI (ChatGPT) โ Used for workflow optimization, debugging SQL queries, DAX troubleshooting, and documentation enhancement.
๐ Links
๐ Dataset Description
This project uses synthetic retail data designed to simulate real-world business scenarios:
๐น 777,288 sales transactions spanning 3 years (2022-2024)
๐น 50,000 customers with segments (Budget, Standard, Premium)
๐น 1,000 products across 5 categories (Electronics, Clothing, Home & Garden, Sports, Toys)
๐น 5 geographic regions with 50+ cities
All datasets were generated using Python to replicate realistic sales patterns including seasonality, customer behavior, and regional variations.
๐ฏ Problem Statements / Goals
This project goes beyond basic dashboards โ itโs about building an intelligent monitoring system that answers:
- ๐ How do we track executive KPIs (revenue, customers, transactions) with YoY comparisons?
- ๐จ Can we detect anomalies in customer behavior and product performance automatically?
- ๐ฅ Who are our high-value customer segments and whatโs their lifetime value?
- ๐บ๏ธ Which regions and products drive the most growth?
- โ๏ธ How do we build automated workflows that refresh data and dashboards daily?
- ๐ฏ Can we reduce false alerts by 85% through adaptive thresholds?
๐ Project Workflow
Phase 1-4: Python (Data Generation & EDA)
Data Generation:
- Created 800K+ realistic transactions with temporal patterns, seasonality, and customer segmentation.
- Simulated discount strategies, product lifecycles, and regional variations.
Data Cleaning & Validation:
- Achieved 100% quality score through automated validation checks.
- Handled missing values, outliers, and data type standardization.
Exploratory Data Analysis:
- Temporal Analysis: Identified Nov-Dec-Jan peak months (140-160% above average).
- Geographic Intelligence: Mapped regional performance and growth trends.
- Customer Segmentation: RFM analysis uncovering 20% of customers driving 32% of revenue.
- Product Portfolio: Category-level performance tracking and anomaly detection.
Phase 5: MySQL Database Engineering
Star Schema Architecture:
- Fact Table:
sales_transactions(777K+ records) - Dimension Tables:
dim_customers,dim_products,dim_dates - 11 Analytical Views: Pre-aggregated KPIs for dashboard performance
Key Innovation โ Adaptive Anomaly Detection:
- Built percentile-based threshold system that adjusts dynamically.
- Reduced false alerts from 10,049 to 1,524 (85% improvement).
- Tracks customer behavioral anomalies, product performance drops, and revenue spikes.
Phase 6: Power BI Dashboards
Built 4 specialized dashboards with 34 interactive visuals:
๐ Dashboard 1: Executive Overview
- ๐ KPI cards with YoY growth indicators (โฒ/โผ)
- ๐ Revenue trends by month, quarter, year
- ๐ฏ Top products and categories by performance
- ๐ Dynamic filters for time-based analysis

๐ Dashboard 2: Anomaly & Risk Monitor
- ๐จ Customer behavioral anomalies
- ๐ Product performance alerts
- โ ๏ธ Revenue spike/drop detection
- ๐ฏ ML-ready framework for future integration

๐ Dashboard 3: Customer Intelligence
- ๐ฅ Customer lifetime value (CLV) analysis
- ๐ RFM segmentation (Recency, Frequency, Monetary)
- ๐ High-value customer identification
- ๐ Cohort analysis and retention tracking

๐ Dashboard 4: Geographic & Product Performance
- ๐บ๏ธ Regional performance heatmap
- ๐ฆ Product portfolio matrix
- ๐ City-level revenue breakdown
- ๐ Category growth indicators

Advanced DAX Techniques:
- Time intelligence functions (SAMEPERIODLASTYEAR, TOTALYTD)
- HASONEVALUE filter awareness for โAllโ selections
- Conditional formatting with hex color codes
- Dynamic YoY growth calculations with unicode arrows
๐ก Key Insights & Achievements
- ๐ 800K+ transactions processed through automated pipeline
- ๐๏ธ MySQL star schema with 8 tables optimized for BI queries
- ๐ 85% reduction in false alerts through adaptive anomaly detection
- ๐ Seasonality patterns identified: Nov-Dec-Jan peak at 140-160% above average
- ๐ฅ Customer segmentation: 20% high-value customers drive 32% of revenue
- ๐ Regional insights: 5 regions analyzed with city-level granularity
- โ๏ธ Automation-ready: Modular design supports scheduled workflows
๐ Things I Learned
- How to design and implement a star schema database for BI workloads.
- Building adaptive anomaly detection using statistical thresholds instead of ML.
- Advanced DAX patterns including time intelligence and filter context handling.
- Structuring projects for automation with modular, reusable code.
- Database optimization through views and indexed relationships.
- Creating multi-page dashboards that tell a complete business story.
- Using AI tools to accelerate debugging, documentation, and design decisions.
๐ฆ How to Explore This Project
- ๐ฅ Clone the GitHub repository.
- ๐ Run the Jupyter notebooks to see data generation and EDA process.
- ๐๏ธ Import SQL scripts into MySQL to build the database.
- ๐ Open the Power BI
.pbixfile and connect to your MySQL instance. - ๐งญ Use dashboard filters and slicers to explore interactive insights.
๐ฎ Future Enhancements (Phases 7-9)
- ๐ค ML Anomaly Detection: Customer churn prediction and revenue forecasting models.
- ๐ฌ AI Text Insights: Auto-generated executive summaries using OpenAI API.
- โ๏ธ Full Automation: Daily pipeline running at 6 AM with email alerts.