About this episode
Data orchestration is evolving rapidly, with dynamic workflows becoming the cornerstone of modern data engineering. In this episode, we are joined by Samyak Jain, Senior Software Engineer - Big Data at 99acres.com. Samyak shares insights from his journey with Apache Airflow, exploring how his team built a self-service platform that enables non-technical teams to launch data pipelines and marketing campaigns seamlessly.Key Takeaways:(02:02) Starting a career in data engineering by troubleshooting Airflow pipelines.(04:27) Building self-service portals with Airflow as the backend engine.(05:34) Utilizing API endpoints to trigger dynamic DAGs with parameterized templates.(09:31) Managing a dynamic environment with over 1,400 active DAGs.(11:14) Implementing fault tolerance by segmenting data workflows into distinct layers.(14:15) Tracking and optimizing query costs in AWS Athena to save $7K monthly.(16:22) Automating cost monitoring with real-time alerts for high-cost queries.(17:15) Streamlining Airflow metadata cleanup to prevent performance bottlenecks.(21:30) Efficiently handling one-time and recurring marketing campaigns using Airflow.(24:18) Advocating for Airflow features that improve resource management and ownership tracking.Resources Mentioned:Samyak Jain -