Work Schedule

Other

Environmental Conditions

Office

Job Description

Summarized Purpose:

We are seeking a Lead Data Engineer to own the complete lifecycle of enterprise data pipelines from development to production, including roadmap planning, scalable ETL architecture, AWS data services, secure PHI/PII handling, healthcare data standards, AI-assisted mapping automation, data quality, transformation, catalog standards, and RAG-enabled data solutions.

Education/Experience:

  • Bachelor's degree or equivalent in Computer Science, Information Technology, Data Engineering, or related field
  • 7+ years of experience in data engineering, ETL development, cloud data platforms, healthcare or regulated data environments, and production data pipeline delivery

Major Job Responsibilities:

  • Design, develop, deploy, and operate scalable ETL and data pipelines using PySpark, Python, advanced SQL, and AWS data services
  • Own data pipeline lifecycle from requirements, mapping, development, testing, deployment, monitoring, production support, release management, and future roadmap planning
  • Build ingestion and transformation pipelines for flat files, relational databases, APIs, data warehouses, healthcare data sources, and enterprise data platforms
  • Implement mapping automation, preferably using AI, along with LLM-assisted data cleaning, transformation, data quality checks, and RAG use cases
  • Implement secure handling of PHI/PII data including encryption, access controls, auditability, retention, masking, de-identification, governance, and operational readiness

Knowledge, Skills, and Abilities:

  • Advanced expertise in PySpark, Python, advanced SQL, ETL best practices, data modeling, and large-scale data processing
  • Strong hands-on experience with AWS services including S3, Glue, Lambda, Step Functions, ECS, DynamoDB, Redshift, RDS/PostgreSQL, and related data services
  • Experience with PostgreSQL, SQL Server, Redshift, flat files, complex source-to-target mappings, HL7, claims data, EMR extracts, and clinical trial data
  • Knowledge of data cataloging, metadata management, transformation standards, orchestration, monitoring, data quality, CI/CD, automated testing, and production support practices
  • Ability to lead technical design, mentor engineers, guide delivery decisions, troubleshoot complex issues, and communicate with cross-functional teams

Must Have Skills:

  • Advanced PySpark, Python, advanced SQL, ETL design, and data pipeline engineering expertise
  • AWS data services experience including S3, Glue, Lambda, Step Functions, ECS, DynamoDB, Redshift, PostgreSQL, and SQL Server integration
  • Secure PHI/PII handling, flat-file ingestion, source-to-target mapping, transformation, data catalog, governance, and healthcare data standards experience
  • CI/CD, GitHub workflows, automated testing, release management for data pipelines and database changes, and dev-to-prod pipeline ownership

Good to Have Skills:

  • AI-assisted mapping automation and use of LLMs for data cleaning, data quality checks, transformation logic, documentation, and patient de-identification support
  • Experience with RAG patterns, embeddings, vector databases, semantic search, or AI-enabled data discovery solutions
  • Familiarity with infrastructure as code such as Terraform or CloudFormation, plus streaming, Databricks, Snowflake, observability, and DevOps practices

Working Hours:

  • India: 05:30 PM to 02:30 AM IST
  • Philippines: 08:00 PM to 05:00 AM PHT

Similar jobs

More from Thermo Fisher Scientific
Thermo Fisher Scientific 4 hours ago
Thermo Fisher Scientific 4 hours ago
Thermo Fisher Scientific 4 hours ago

Lead Data Engineer

Apply On Company Site
Back to search page