Senior Data Architect
We are looking for a technical lead who will design, build and maintain the data pipeline for creating training datasets for our AI research engineers. Additionally he or she will be responsible for automating the large dataset creation process. The ideal candidate should have 6-10 years of industrial experience in related field as a Data Engineer or related specialty (e.g., Software Engineer, Business Intelligence/Data/DW Engineer, Data Scientist etc.) and 1-2 years of experience in leading a team.
1. Lead the data pipeline setup, operation and maintenance.
2. Assemble large, complex data sets that are analysis/training ready for the machine learning engineers/researchers
3. Design and build scalable and reliable data pipeline that collects, transforms, loads and curates data from internal systems. Ensure high data quality for pipelines you build and make them auditable. Support design and deployment of distributed data store that will be central source of truth across the group.
4. Develop, customize, configure automation scripts/tools that help engineers to extract and analyze data from our internal data store. Develop reporting and data visualization solutions, as well as looking to build out a dynamic platform
5. Evaluate new technologies and build prototypes for continuous improvements in data engineering. Creation of new capabilities and modules in our data pipeline. Develop and maintain expertise in advanced and/or emerging data management and analytical information technologies such as data warehouse, data lake and Big Data
6. Build data connections to company's internal IT systems
7. Design, implement and continuously optimize the group’s data strategy. Provide thought leadership and lead efforts to design data integration and implement extract, transform and load (ETL) jobs/processes, detailed data warehouse models and data mappings. Provide consultation on best practices and standard practices to internal team members
8. Perform performance optimization and tuning on new and/or existing data warehouse implementations.
1. 5+ years of hands on industry experience with a track record of manipulating, processing, and extracting value from large data sets.
2. Demonstrated ability in building data pipelines, data modeling, ETL development and familiarity with design principles. Experience building data products incrementally, integrating, and managing data sets from multiple sources. Knowledge of data warehouse technologies and relevant data modeling best practices. Experience with a DW technology (Redshift, SQL Server, etc.) and relevant data modeling. Experience processing large amounts of data, in various formats and processing data in batch mode and streaming mode
3. Excellent SQL skills. Proficiency in a scripting language (Python, Ruby, Perl etc.) and/or a major programming language (C++, Java etc.). Knowledge of R is a plus.
4. Experience with working in Spark/Hadoop and/or other distributed computing frameworks is required
5. Experience working in a multi-layered distributed architecture is essential. Experience with scalable service architecture and design
6. Exposure and knowledge of Data Security and Governance. Awareness of best practices to secure data and processes from unauthorized access.
7. Knowledge and direct experience using business intelligence reporting tools (Tableau, PowerBI etc.) is a plus.
8. Understanding of data science, machine learning, and AI is a plus.
9. Strong analytical and problem solving skills (data analysis and requirement documentation)
10. Excellent project management skills and ability to prioritize issues
11. Excellent oral and written communication, organizational and client facing skills.
Academic Qualification Profile:
B.E. / B. Tech in Computer Science
Certification or Masters in Big Data Science