Open Source data engineering demo project using dbt, DuckDB, dlt, Dagster and Metabase. Two storage modes for the delta tables are supported: local and Microsoft Fabric Onelake.
-
Updated
Jun 30, 2026 - Python
Open Source data engineering demo project using dbt, DuckDB, dlt, Dagster and Metabase. Two storage modes for the delta tables are supported: local and Microsoft Fabric Onelake.
SCD2 implementation using pyspark
A modern banking data pipeline built with Dagster and DBT!
P&C insurance claims lakehouse: Azure ADLS + Databricks (PySpark/Delta) + Snowflake + dbt, real-time FNOL fraud signals via Kafka, Airflow-orchestrated, Terraform-provisioned, OIDC-secured, with data contracts, lineage, and ADRs throughout.
Advanced Healthcare Claims Pipeline using Snowflake, Snowpipe, Streams, Tasks, SCD Type 2, and AWS S3. Automates ingestion, CDC, dimensional modeling, and data quality checks for healthcare patient and claims data.
Fortune-500-grade banking analytics platform: OLTP -> medallion lakehouse -> Kimball star schema -> semantic layer -> 9-tab executive dashboard + 5 ML models (churn, fraud, segmentation, forecasting). Production-ready, governed, fully tested.
Modern data stack reference: dbt + BigQuery + Airflow (Cloud Composer) with medallion layering, SCD2 snapshots, exposures, freshness SLAs, and 45× cost reduction via partition + cluster + incremental tuning.
End-to-end Medicare data engineering pipeline: API ingestion, PostgreSQL 17, dbt, dimensional modeling (Kimball/SCD2), Apache Airflow orchestration, and Evidence.dev dashboard. Built on a QEMU/KVM Rocky Linux VM.
Production-grade parameterized ETL pipeline implementing SCD Type 2 for travel booking data using Databricks, Delta Lake, and ADLS — includes data quality checks, incremental fact table build, Z-Order optimization, and SQL reporting.
Pipeline ETL MySQL en 3 couches - staging, modele en etoile avec SCD Type 2, marts analytiques. Orchestrateur Python, 18 tests de coherence inter-couches
Batch retail data lakehouse on Databricks: Delta Live Tables (bronze → silver → gold), Unity Catalog, synthetic data generator, and an executive analytics dashboard.
An end-to-end analytics engineering pipeline that transforms raw API telemetry into actionable business metrics. Built with Python, DBT, and DuckDB to model usage, monitor latency, and calculate tiered billing.
This is a data engineering pipeline built on Databricks + Delta Lake + PySpark that ingests travel booking and customer master data, applies SCD Type 2 logic, and delivers analytics-ready tables. It includes data quality enforcement, dimension versioning, fact aggregation, and performance tuning.
reference snowflake ingestion patterns: streams and tasks, and dynamic tables with scd2 and deduplication. provisioned with terraform, plus a dbt sandbox.
Plataforma BI end-to-end para agroexportadora peruana ficticia de pimiento piquillo. SQL Server DW con SCD2, ETL con stored procedures, dashboard Power BI con RLS.
Add a description, image, and links to the scd2 topic page so that developers can more easily learn about it.
To associate your repository with the scd2 topic, visit your repo's landing page and select "manage topics."