We are seeking a Staff AI/ML solution lead to lead the architecture, design, and delivery of high-performance, enterprise-grade applications. This role combines deep hands-on coding with high-level architectural decision-making. You will work across frontend, backend, cloud infrastructure, database selection and integration layers, ensuring our systems are secure, scalable, and maintainable while enabling long-term technical growth. This hybrid role combines hands-on software engineering, devops and architectural leadership, enabling the delivery of robust, scalable, and innovative AI systems.

Key Responsibilities:

  • Architecture Leadership – Define system architecture, integration patterns, and technology standards for large-scale web and enterprise applications.

  • Full Stack Development – Build and maintain robust, responsive applications using modern frontend frameworks (React, Vue, streamlit or Angular) and backend services in Python, Golang or RUST.

  • Cloud & Infrastructure – Architect cloud-native solutions leveraging AWS with a focus on scalability, security, and performance. Implement containerized services with Docker and orchestrate deployments using Kubernetes (Ks).

  • API & Service Design – Develop RESTful and GraphQL APIs for internal and external integrations.

  • DevOps & CI/CD – Establish best practices for deployment pipelines, automated testing, and infrastructure-as-code (Terraform, Pulumi).

  • Performance Optimization – Drive system performance tuning, load balancing, and efficient code design.

  • Technical Mentorship – Coach and mentor engineers, conduct design/code reviews, and uphold engineering best practices.

  • Cross-Functional Collaboration – Partner with product, design, and business teams to deliver impactful solutions aligned with company objectives.

  • Databases: Will be performing database selection and deployment (strong devops experience required)

  • ML: Experience with both ML and LLM stack design (model hubs, vector DBs, embedding pipelines). The role required knowledge to deploy end-to-end architecture of ML applications, traditional and RAG applications, Design of the MLOPS architectures databricks, aws and google
    ML ops: Strong uderstanding of Agentic AI, framework, best practices
    Clouds: Databricks, AWS mandatory

  • End to End production level AI/MLl product deployment experience is required

  • Qualifications

    Must Have:

    Required Qualifications:

  • At least bachelor's in Computer Science mandatory

  • + years in deployment enterprise grade cloud level experience and + years in software development

  • + years of experience with Databricks and AWS MLops deployment

  • This role is more of a software lead and developer with strong Cloud experience to develop infra softwares.

  • Architect end-to-end agentic pipelines and tools for others to contribute in the team

  • The role required knowledge to deploy end-to-end architecture of ML applications, traditional and RAG applications.

  • Architect end-to-end AI/ML systems from data ingestion to model deployment.

  • Define best practices for model serving, data pipelines, and ML-OPS strategies.

  • engineering, including hands-on model development and architectural design.

  • Expertise in traditional ML, deep learning, LLMs, embeddings, and RAG frameworks.

  • Strong software engineering skills: Python, API development, microservices, database design, and version control (Git).

  • Experience with cloud platforms (AWS, Databricks, Google) and containerized deployments (Docker, Kubernetes).

  • Knowledge of ML-OPS, CI/CD for AI, and production model monitoring.

  • Strong understanding of software architecture patterns, distributed systems, and scalable data pipelines.

  • Databases: Will be performing database selection and deployment (strong devops experience required)

  • Preferred:

    Experience with event-driven architectures and messaging systems (NATs, Kafka, RabbitMQ).

  • Familiarity with authentication and authorization frameworks (OAuth, JWT, SSO).

  • Knowledge of observability and monitoring tools (Prometheus, Grafana, OpenTelemetry).

  • Background in designing large-scale enterprise or SaaS platforms.

  • Python, Golang and Rust development experience is preferred

  • Experience in manufacturing and predictive maintenance is a plus

  • Background in controls engineering is a plus

  • Soft Skills

  • Strong decision-making and problem-solving skills in high-stakes technical environments.

  • Ability to lead and influence architectural direction across teams.

  • Excellent communication with both technical and non-technical stakeholders.


  • Similar jobs

    Senior Software Cloud Fullstack Developer (Self Driving)

    Apply Now
    Back to search page