We are developing the world’s first enterprise-level Platform-as-a-Service (PaaS) for robots, creating a rare opportunity for an experienced, product-focused engineering professional. The PaaS aims to aid and offer innovative features to handle every part of the product life cycle required to support and deliver consumer-facing connected machines and services.
Site Reliability Engineering combines skills of software and systems engineering. Your key responsibility is to focus on optimizing existing systems, building infrastructure, and eliminating work through automation to make them more reliable and ensure the highest possible uptime for all users and developers on the rapyuta platform.
Your responsibilities will include the following but not limited to:
Support services before they go live through activities such as system design consulting, capacity planning, and launch reviews.
Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement.
Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
Practice sustainable incident response and postmortems.
Build and evolve the operations handbook.
At least 3 years of relevant work experience.
Mastery of one or more of the following programming languages including but not limited to Python, Golang, Ruby, Java.
Strong understanding of Linux/Unix fundamentals.
Solid understanding of software architectures and distributed systems.
Strong analytical and debugging skills.
Good understanding of algorithms and data structures.
Ability to build and deliver hands-on technology, proof of concepts, and demonstrations.
Familiar with Config Management, Docker, Infrastructure as a service, Platform as a service, Continuous Delivery, Continuous Integration, DevOps.
Experience with languages Golang or Python
Experience cloud platforms such as Google Cloud Platform/Amazon Web Services/Azure
Experience with Docker or Kubernetes.
Open source contributions and projects.
Experience with SQL and NoSQL databases, as well as queuing systems.
DevOps and continuous deployment practice is a plus.
Familiarity with Robot Operating System and/or familiarity with robot software architecture is a bonus.
Understands the risks involved in a startup (previous startup experience preferred)
Bleeding edge technology
Working with exceptionally talented engineers
Pet Friendly Workspace
Insurance coverage for employee and family from the day of joining