We are looking for a Data Engineer to join our AI/LLM Delivery Unit, responsible for building scalable data pipelines and infrastructure that power AI and machine learning solutions.
This role plays a critical part in enabling LLM-based applications, data workflows, and AI model lifecycle management. The ideal candidate has strong experience in data engineering, cloud platforms, and pipeline automation, with exposure to AI/ML environments.
Key Responsibilities
1. Data Pipeline Development
2. AI / LLM Data Infrastructure
o Text corpora and unstructured datasets
o Embeddings and vector databases
o Retrieval-Augmented Generation (RAG) systems
3. Data Processing & Automation
4. Data Quality & Governance
5. Collaboration & Delivery
Qualifications
Education
Experience
Technical Skills
Core Skills
· Strong programming skills in Python and/or Scala
· Expertise in SQL and database design
· Experience building ETL pipelines (Airflow, Dagster, or similar)
Data & Platform Skills
· Experience with:
o Data warehouses (Snowflake, BigQuery, Redshift)
o Distributed data processing (Spark)
o APIs and data integration
· Familiarity with streaming tools (Kafka, Kinesis) is a plus
AI/LLM-Related Skills
· Experience working with unstructured data pipelines (text, NLP datasets)
· Familiarity with:
o Vector databases (Pinecone, FAISS, Weaviate)
o Embeddings pipelines
o RAG architectures
Cloud & DevOps
· Hands-on experience with AWS, Azure, or GCP
· Knowledge of:
o Docker / containerization
o CI/CD pipelines
o Infrastructure-as-Code (Terraform is a plus)
---
Core Competencies
· Strong data modeling and system design skills
· Attention to detail and data quality
· Problem-solving and analytical thinking
· Effective communication with both technical and non-technical stakeholders
· Ability to work in fast-paced, delivery-oriented environments
---
Nice-to-Have
· Experience in AI/LLM or Generative AI projects
· Familiarity with annotation pipelines or data labeling workflows
· Exposure to MLOps frameworks
· Experience in high-scale or enterprise data environments
---
What Success Looks Like
· Builds robust, scalable data pipelines supporting AI/LLM projects
· Improves efficiency and reliability of data workflows
· Enables faster model development through high-quality datasets
· Supports successful delivery of client-facing AI solutions
By continuing you agree to our Terms & Privacy Policy.