This is usAt Avenga, we believe that human creativity empowers technology that matters.Operating globally, our + specialists provide a full spectrum of services, including business and tech advisory, enterprise solutions, CX, UX and Ul design, managed services, product development, and software development.This is the jobInMexico (CDMX) within theData & Analyticsindustry, we are actively seeking aSenior Data Engineerto strengthen our team dedicated to building and optimizing end-to-end data pipelines under a Medallion architecture.Your mission will be to enable data flow for both traditional analytics (BI/ML) and advanced search architectures.This is a hybrid position; candidates must be based in CDMX.This is youBachelor's degree in Systems Engineering, Computer Engineering, Software Engineering, or related fields.Experience as a Data Engineer (5+years), with at least 4 years working in Cloudera/Hadoop environments.Expert-level proficiency in Spark (Py Spark/Scala) for distributed processing.Solid experience in relational and dimensional data modeling.Hands-on experience working with Kerberos for users and services.Proficiency in the Cloudera Ecosystem: Hive, Impala, HBase, HDFS, Kafka, Oozie, and Hue.Advanced SQL skills (complex joins, window functions) and experience with Change Data Capture (CDC).
Spanish native.Nice-to-have skills:Experience in administration and troubleshooting at the Cloudera Manager level.Knowledge of Search & Indexing tools: Solr, vector databases, and Apache Iceberg.Experience with Security & Governance tools: Apache Ranger and Apache Atlas.Familiarity with Orchestration & Dev Ops: Apache Airflow and Docker.This is your roleBuild robust multi-source ingestion pipelines from RDBMS, Web Services (APIs), and flat files into HDFS.Implement Medallion Architecture layers (Bronze, Silver, Gold) for BI and Data Science consumption.Design data flows for semantic search, integrating data into vector and indexed databases.Manage complex orchestration workflows and container deployments.Configure access controls, data lineage, and ensure security/authentication in Kerberos-protected environments.Tune Spark and Impala processes to optimize efficiency in distributed processing.
By continuing you agree to our Terms & Privacy Policy.