Responsibilities
Lead and Architect
end-to-end data solutions, providing technical direction and architectural oversight for complex data pipelines and platforms, ensuring robust performance, scalability, data quality, security, and compliance.
Drive Strategic Initiatives
within small, co-located squads (4-7 person teams), fostering an environment of high communication, minimal coordination overhead, and collective ownership to deliver impactful data products.
Act as a Player/Coach , leading by example in hands‑on development while actively mentoring and elevating the technical capabilities of junior and mid‑level engineers, cultivating a culture of technical excellence and innovation.
Design, Develop, and Optimize
highly efficient and resilient data ingestion, processing, and transformation pipelines using advanced Python and PySpark techniques for petabyte‑scale datasets.
Architect and Implement
sophisticated data storage solutions leveraging a diverse set of big data technologies including Hive, distributed file systems (HDFS, S3), and enterprise‑grade NoSQL databases (Cassandra, MongoDB).
Champion Data Modeling and Governance , designing scalable data models and schemas that support advanced analytics, machine learning, and critical reporting needs, ensuring data integrity, accessibility, and discoverability.
Strategically Engage
with data consumers, data scientists, and business stakeholders to deeply understand their requirements, translating them into robust data solutions and providing expert guidance on data utilization and interpretation.
Lead the Implementation
of real‑time data streaming and complex event‑driven architectures using technologies like Apache Kafka, ensuring low‑latency data availability for critical business functions.
Enforce and Evolve Best Practices
in data engineering and software development, spearheading rigorous code reviews, comprehensive automated testing strategies, and robust CI/CD pipelines within a DevOps culture.
Exhibit High Autonomy and Agency , taking full ownership of technical challenges, making well‑reasoned architectural decisions, and proactively identifying and implementing continuous improvements across the data landscape.
Innovate with AI‑Powered Development , actively leveraging, integrating, and contributing to AI coding tools (internal Citi AI tools, Copilot, Claude Code, Codex, Antigravity) to set new benchmarks for productivity, code quality, and development velocity, and inspiring others to do the same.
Shape the Future of Our Data Stack , actively participating in technical discussions, evaluating new technologies, and making strategic recommendations that align with business objectives and architectural vision.
Expertly Troubleshoot and Resolve
the most challenging technical issues within complex, distributed big data environments, applying advanced analytical and problem‑solving methodologies.
Required Skills & Experience
Experience:
6+ years of progressive, hands‑on experience as a Senior/Lead Data Engineer, with a proven track record of architecting and delivering complex, large‑scale data solutions, and operating effectively as a player/coach.
Programming Languages:
Expert‑level proficiency in Python, with deep expertise in developing highly optimized, scalable, and production‑grade PySpark applications for mission‑critical data processing.
Big Data Frameworks/Technologies:
Deep architectural understanding and extensive hands‑on experience with the entire Apache Spark ecosystem (Spark Core, Spark SQL, Spark Streaming, Spark MLlib). Advanced proficiency with Hive for enterprise data warehousing, including optimization techniques for large and complex knowledge of distributed computing fundamentals, HDFS, and other components of the Hadoop ecosystem.
Data Storage & Management:
Master‑level proficiency in SQL, complex query optimization, and advanced data warehousing concepts (dimensional modeling, data vault, data lakes). Extensive experience with various data storage formats (Parquet, ORC, Avro) and leading data lake solutions (Delta Lake, Iceberg). Proven experience with enterprise‑grade NoSQL databases (Cassandra, MongoDB, HBase) and understanding of their architectural trade‑offs.
Messaging & Event Streaming:
Expert‑level experience with Apache Kafka, including design and implementation of high‑throughput, low‑latency real‑time data pipelines and event‑driven microservices architectures.
Cloud Platforms:
Extensive experience with big data services on major cloud platforms (AWS EMR/Glue/Redshift/Kinesis, Azure Databricks/Data Factory/Synapse/Event Hubs, GCP Dataflow/Dataproc/BigQuery/Pub/Sub), including cloud‑native architectural patterns.
AI‑Powered Development & Productivity:
Mandatory: Demonstrated mastery and innovative application of AI coding tools (Claude Code, Codex, Antigravity) to significantly enhance the development lifecycle. A proactive, "AI‑first thinker" mindset, with a proven ability to evaluate, integrate, and evangelize new AI tools and methodologies within the team to drive continuous improvement and innovation.
Domain Understanding:
Expert ability to articulate the intricacies of the functional domain, proactively identifying business challenges and opportunities, and translating them into impactful, data‑driven solutions.
Leadership & Mentoring:
Proven ability to lead technical discussions, mentor team members, and foster a collaborative and high‑performing engineering culture.
Other Essential Skills:
Advanced understanding of software engineering principles, design patterns, data structures, algorithms, and performance engineering for distributed systems. Extensive experience with RESTful API design, development, and integration for data expertise in containerization technologies (Docker, Kubernetes) and orchestration for deploying and managing scalable data proficiency with version control systems, especially Git, including advanced branching, merging, and code review strategies. Exceptional problem‑solving, analytical, and debugging skills applied to highly complex, distributed big data ecosystems. Superior communication, presentation, and interpersonal skills, with the ability to articulate complex technical concepts to diverse audiences and influence strategic decisions. Demonstrated highest levels of autonomy and agency in driving strategic initiatives and delivering impactful, innovative data solutions.
Education
Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related field is required. A Master's degree is strongly preferred. Equivalent advanced practical experience with a demonstrable track record of architecting and leading major data initiatives will also be considered.
#J-18808-Ljbffr