Data Engineer (Databricks & AWS)

Remote (Latin America / Europe) | 9 AM - 5 PM EST | Full-time

At Cloud Geometry, we partner with industry leaders like AWS, Google, and Databricks to deliver cutting-edge cloud-native solutions. We are looking for a Senior Data Engineer to join our flagship project: a modern Data Platform for the life sciences industry, supporting global leaders like Pfizer, Moderna, and Novartis in developing innovative RNA-based solutions using cloud computing and advanced AI.

If you are an experienced data engineer who thrives in high-impact environments, zeroes out legacy systems, and wants to play a key technical leadership role in building scalable lakehouse architectures, let's talk!

Key Responsibilities

Pipeline Engineering: Design, develop, and optimize high-performance ETL pipelines within Databricks to connect analytics-ready data back to operational services. Architecture Leadership: Lead technical architecture discussions with engineering, product managers, and data scientists to implement advanced analytics. Workflow Optimization: Build, fine-tune, and monitor Databricks workflows to ensure system reliability, performance, and data integrity. Data Quality & Security: Collaborate with ML teams to ensure secure, rigorous, and accurate data ingestion across all processing stages. Agile Execution: Actively participate in daily Scrum ceremonies within a globally distributed engineering team.

Technical Requirements & Stack

1. Core Data Engineering

Databricks Ecosystem: 2+ years of hands-on experience (Delta tables/Iceberg, Spark jobs, MLflow, Unity Catalog, Model Registry). Architecture: Expert-level understanding of modern Lakehouse architectural design principles. Languages: Expert-level Python (for data processing/ETL) AND Type Script / Node.js (for backend services using Hapi JS, Zod, and Jest).

2. Cloud Infrastructure & Dev Ops (AWS)

Compute & Storage: ECS (Fargate/EC2), Lambda, S3, and Athena. Messaging & Orchestration: SQS/SNS and Airflow. Dev Ops & CI/CD: Git Hub Actions, Code Build, Docker, and repository templates via Cruft.

3. Data Stores & MLOps

Databases: Postgre SQL (ACID/Migrations), Dynamo DB (High-scale Key-Value), and Redis (Caching/Rate limiting). Search: Open Search / Elasticsearch for full-text search and aggregations. Gen AI: Practical knowledge of LLMs, agents, function calling, and RAG architectures.

Qualifications

Experience: 5+ years in software development with a strong focus on data engineering/analytics teams. Senior Autonomy: Proven ability to challenge decisions, propose architectural improvements, and deliver complex features end-to-end. Communication: Exceptional English skills (written and spoken) to articulate complex data ideas to global stakeholders. Availability: Required online presence from 9 AM to 5 PM EST.

Nice to Have:

Professional Databricks or AWS certifications. Experience building internal SDKs or developer experience tooling. Experience working directly alongside Data Scientists and ML Developers.

What We Offer (Our Commitment to You)

Comprehensive compensation and benefits package. Zero legacy systems – work exclusively with cutting-edge technologies. Continuous Learning: Extensive training, certifications, hackathons, and Udemy access. Premium Tooling: Developer Pro access to Claude Code, Codex, and Anti Gravity. Top-tier Culture: A collaborative, supportive environment with global experts.

Ready to build the future of Life Sciences?

Click Apply or send us your resume. Let's build something massive together!

#Data Engineering #Databricks #AWS #Python #Type Script #Remote Jobs #Lakehouse


Data engineer

Apply Now
Back to search page