Project Spotlight

SupplyStream: Synthetic Retail Supply Chain Simulator + Analytics

A realistic, Python-driven simulator of an Indian retail network that generates analytics-ready CSVs and creation of a Power BI dashboard to explore service, cost, inventory, and returns trade-offs end-to-end.

🧭 Executive summary

  • Problem: Often analytics teams need realistic supply chain data to prototype dashboards and ML models without relying on sensitive production systems.

  • Approach: Simulate a hub-and-spoke network with two-tier transport, generate clean facts/dims as CSVs, and wire them into a Power BI model for rapid analysis.

  • Outcome: A plug-and-play dataset and reference dashboard that surface fulfillment, on-time delivery, stockout risk, transport cost, and returns behavior.

Video demo of the Power BI dashboard

❗ Problem statement

Building supply chain dashboards and training models often stalls due to scarce, sensitive, or fragmented operational data, slowing iteration and increasing risk.

Young data professional need a safe, configurable data source that mirrors real network behavior so they can prototype fast and evaluate trade-offs before touching production.

🛠️ Decision-making approach

  • Model a hub‑and‑spoke network with ~100 stores, three regional hubs, and specialized suppliers to reflect realistic flows and constraints.
  • Simulate demand with seasonality and lead times, replenish via ROP logic, and generate shipments, inventory snapshots, and returns with plausible variability.
  • Export tidy facts/dimensions to CSV for immediate ingestion in BI tools using raw GitHub URLs, minimizing ETL friction.
  • Design the Power BI model to answer core questions first (service, cost, stock risk), then enable deeper drilldowns by hub, store, product, and time.

🏗️ Architecture in brief

Python simulator → SupplyChain_Data/*.csv → GitHub raw links → Power BI data model with DAX measures and relationships for analysis.

🚚 Logistics at a glance

  • Three regional hubs (DEL, BOM, BLR) route apparel, footwear, and accessories to stores.
  • Two‑tier transport: inter‑hub cross‑dock legs and final‑mile courier deliveries with realistic costs and distances.
  • Returns are probabilistic by line with reason codes to support reverse logistics analysis.

📂 Outputs

  • Dimensions: hubs, stores, products, suppliers for clean joins and filters.
  • Facts: orders, shipments, inventory snapshots, inbound, and returns to trace the end‑to‑end journey.
  • Link table: shipment↔order mapping to attribute costs and service outcomes correctly.

📊 Analytics layer

  • Service: fulfillment rate and on‑time delivery computed from shipment legs and required dates.
  • Inventory: stockout risk flagged via on‑hand vs ROP; average on hand tracked at SKU level.
  • Cost: total transport and cost per unit tied back through link tables for proper attribution.
  • Returns: rates and reasons to understand reverse logistics patterns by category and channel.

💡 What this enables

  • Prototype dashboards safely and iterate UX fast, aligning to at‑a‑glance KPIs before live data is available.
  • Run “what‑if” analysis on lead times, reliability, and demand patterns to see service‑cost trade‑offs.
  • Train and benchmark ML features on consistent, labeled data that mirrors real supply behavior.

✅ Validation

  • Validity checks on order volumes, lead times, and cost curves to ensure realistic ranges and distributions.
  • Model integrity checks: relationship mappings and referential joins verified for facts, dims, and link tables.
  • Dashboard QA: KPIs align with measure definitions; layouts prioritize clarity and scanning patterns.