About this episode
Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable.In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access and Airflow’s evolving features.Key Takeaways:00:00 Introduction.02:13 Overview of the company’s operations and global presence.04:00 The tech stack and structure of the data engineering team.04:24 Running nearly 2,000 DAGs in production using Airflow.05:42 How Airflow’s UI empowers stakeholders to self-serve and troubleshoot.07:05 Details on the Kubernetes-based Airflow setup using Helm charts.09:31 Transition from GitSync to NFS for DAG syncing due to performance issues.14:11 Making every team member Airflow-literate through local installation.17:56 Using custom libraries and plugins to extend Airflow functionality.Resources Mentioned:Sébastien Crocquevieillehttps://www.linkedin.com/in/scroc/Numberly | LinkedInhttps://www.linkedin.com/company/numberly/