We are seeking an experienced Senior Data Engineer with expert-level skills in PySpark and hands‑on experience building ETL pipelines, data lake architectures, and data feed integrations on AWS to join our team. You will work with both structured and unstructured data, ingesting from multiple on‑premises and enterprise data sources such as SAP, Intelex, SQL, and OSI PI into AWS. This role offers the opportunity to contribute to large‑scale data solutions and collaborate with cross‑functional teams in a dynamic environment.


Responsibilities
  • Design, develop, and optimize ETL pipelines using PySpark and AWS Glue Jobs to process large volumes of structured and unstructured data
  • Orchestrate data workflows with Apache Airflow, ensuring reliable scheduling, dependency management, and robust error handling
  • Build and maintain data feeds from on‑premises and enterprise systems into AWS data lake environments
  • Integrate with enterprise data sources including SAP for ERP and operational data, Intelex for environmental, health, safety, and quality data, SQL databases for relational data, and OSI PI for real‑time industrial and process historian data
  • Develop and manage API interactions to extract data from on‑premises services into AWS
  • Handle data extraction, transformation, and loading across various formats and protocols
  • Support the design and maintenance of AWS data lake architectures using Amazon S3, AWS Glue, and Lake Formation
  • Ensure data is cataloged, partitioned, and optimized for analytics and reporting
  • Implement data quality checks, validation, and lineage tracking across all pipelines

Requirements
  • Minimum 3 years of experience in data engineering roles
  • Advanced proficiency in Python and PySpark for data processing and pipeline development
  • Strong background in Extract, Transform, Load (ETL) processes
  • Experience orchestrating workflows with Apache Airflow
  • Proven track record building production‑grade data pipelines on AWS
  • Hands‑on experience with AWS Glue Jobs for ETL processing
  • Familiarity with Amazon S3, data lake patterns, and data cataloging techniques
  • Experience using AWS‑native monitoring and operational tools
  • Skilled in integrating with enterprise systems via APIs, JDBC, or native connectors, including SAP, Intelex, SQL databases, and OSI PI
  • Ability to work with both structured and unstructured data formats
  • Excellent documentation, communication, and collaboration skills
  • English communication skills at B2+ level or higher, both written and spoken

Nice to have
  • Familiarity with energy, oil & gas, or industrial data environments
  • Understanding of Drilling and Completions data flows and terminology

We offer
  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award‑winning culture recognized by Glassdoor, Newsweek and LinkedIn
#J-18808-Ljbffr

Senior Data Engineer (Python & AWS)

Apply Now
Back to search page