Khushi Baby, Senior Data Engineer
About the portfolio organization
Khushi Baby is an 11-year-old digital health nonprofit with deep expertise in designing and deploying solutions for Ministries of Health across India. As a technical support partner to state governments in Rajasthan, Karnataka, and Maharashtra, Khushi Baby drives data-driven health systems strengthening at scale.
With a 120-member interdisciplinary team spanning public health, epidemiology, data science, software engineering, product design, and health policy, Khushi Baby reimagines how community health data is collected, integrated, and translated into meaningful action. The organization works to ensure that frontline data does not remain static but instead informs timely decision-making and improved service delivery at the last mile.
Khushi Baby’s work has secured over $20 million in government co-investment and is projected to reach 100 million people through 100,000 community health workers. By 2030, the organization aims to influence 15 percent of India’s public health decision-making processes through strengthened data-to-action systems.
At its core, Khushi Baby seeks to close the public health feedback loop at the last mile by connecting quality data, actionable insights, and effective ground-level action. The organization enables transformation across policy, program design, frontline practice, and partnerships, ensuring that digital systems translate into measurable improvements in health outcomes.
About the Fellowship role
The Senior Data Engineer Fellowship role is designed for a technically strong and systems-oriented leader who will architect, build, and optimize scalable data systems to power public health analytics at scale. This role sits at the core of Khushi Baby’s digital infrastructure and will be responsible for designing robust data architectures, defining data workflows, and ensuring high-quality, secure, and compliant data pipelines that support experimentation, product development, and programmatic impact.
The Fellow will work closely with product, engineering, data science, implementation, and research teams to translate frontline health data into reliable, real-time insights that inform public health decision-making. In addition to technical leadership, the role includes mentoring data engineers and strengthening organizational data systems to support long-term scale.
Location: In Person - Jaipuri, India
Employment: Full-time, one-year Fellowship
Starting Date: July 2026
Key responsibilities
Data Architecture & Infrastructure
- Design and implement scalable data architectures and modeling strategies
- Build and optimize ETL/ELT pipelines for structured and unstructured public health data
- Develop and manage real-time and batch processing systems (e.g., Kafka, Flink, RisingWave)
- Anticipate future infrastructure needs and ensure scalability and cost-effectiveness across cloud platforms (AWS, GCP, Azure)
Data Quality, Security & Compliance
- Lead the implementation of robust data quality assurance protocols
- Embed data integrity and standardization from the point of collection
- Ensure compliance with public health interoperability standards (FHIR, HL7, ICD-10)
- Implement secure access control, encryption, and regulatory-compliant data frameworks
Collaboration & Impact
- Partner with product, design, and field teams to define indicators and refine tools
- Translate technical architecture into actionable insights for cross-functional teams
- Support A/B testing, experimentation, and analytics initiatives
- Contribute to defining KPIs and success metrics using data-driven approaches
Team Leadership & Mentorship
- Mentor and manage data engineers and analysts
- Build technical depth within the data team
- Foster a culture of experimentation, documentation, and continuous learning
Technical Optimization & Documentation
- Conduct performance tuning of databases, queries, and pipelines
- Optimize cloud infrastructure for performance and cost efficiency
- Maintain thorough documentation of data architecture, workflows, lineage, and metadata
Requirements
Experience and education
- Bachelor’s degree in Computer Science, Data Engineering, or a related field
- 5+ years of experience in data engineering
- 2+ years in a leadership or team management role
Hard skills:
- Advanced proficiency in SQL and Python
- Strong experience with data modeling and pipeline orchestration (e.g., Airflow, Mage AI)
- Experience with big data technologies such as Apache Iceberg or Delta Lake
- Knowledge of streaming and CDC tools (Kafka, Debezium, Redpanda)
- Experience with cloud platforms (AWS, GCP, Azure)
- Deep understanding of partitioning, indexing, caching, and compression techniques
- Experience building scalable real-time and batch processing systems
Soft skills:
- Strong analytical and problem-solving abilities
- Logical reasoning and structured thinking
- Clear communication skills across technical and non-technical stakeholders
- Leadership and mentorship capability
- Systems thinking mindset with attention to detail
Must Haves:
- Minimum 5 years of hands-on data engineering experience
- Proven leadership experience managing or mentoring technical teams
- Strong proficiency in SQL and Python
- Experience designing and deploying scalable cloud-based data systems
- Ability to work in-person in Jaipur
- Legal authorization to work in India
- Strong written and verbal English communication skills
