Built enterprise-scale data processing system using Apache Spark on EMR clusters to manage 22M+ row Iceberg tables with 300K+ daily operations, implementing custom schema alignment and deduplication logic.
Developed production Spark applications in Scala with comprehensive ScalaTest/Mockito test suites, IAM-secured MySQL integration, and Airflow orchestration, reducing data pipeline runtime from hours to 15-minute cycles.
Architected data lakehouse migration from deprecated infrastructure to Apache Iceberg format with Prometheus and Grafana monitoring, enabling hourly parquet exports, eliminating 4-hour delays for downstream systems.