OverIT is a global Software-as-a-Service (SaaS) company with a strong presence in North America and Europe.We empower organizations in the power, utility, telco, and transportation industries to manage their mission-critical infrastructures efficiently and safely through cutting-edge Field Service Management software solutions.At OverIT, we leverage advanced technologies like ML (Machine Learning), AR (Augmented Reality), IoT (Internet of Things), and GIS (Geographic Information System) to help ensure the infrastructures essential to our daily lives are always on.If you want to be part of a top technology brand, join us!What you’ll doAct as a Site Reliability Engineer, operating and improving the reliability, availability, observability, and performance of the OverIT SaaS platform running on AWS cloud environments.Drive observability initiatives across the company, managing and evolving monitoring and alerting platforms with a strong focus on Dynatrace, dashboards, anomaly detection, and operational visibility.Support production operations activities, troubleshooting issues across compute, networking, storage, Kubernetes workloads, databases, IAM, and managed cloud services.Perform root cause analysis (RCA) on incidents and actively contribute to blameless post-mortems.Apply SRE principles such as Error Budgets, service reliability objectives, and operational excellence practices to balance platform stability and delivery velocity.Identify and eliminate operational toil through automation, tooling, scripting, and process optimization initiatives.Design, improve, and maintain backup and disaster recovery procedures, ensuring compliance with SLA/SLO targets.Operate within a security-first and compliance-driven environment aligned with major industry certifications and security best practices.What you’ll need3–5 years of experience in Site Reliability Engineering or Cloud Operations roles, with strong hands-on experience on AWS environments including EC2, EKS, RDS, S3, IAM, VPC, networking, and CloudWatch.Strong knowledge of Dynatrace and modern observability practices in distributed cloud environments.Experience with incident management, alert handling, troubleshooting, and production support processes.Strong focus, ownership, and attention to detail in backup and disaster recovery strategies, including backup validation, restore procedures, retention policies, and operational resilience practices.Good understanding of AWS networking concepts, including VPC peering, Transit Gateways, security groups, routing strategies, and Load Balancing (ALB/NLB) within multi-region SaaS architectures.Security-first mindset with practical experience applying AWS security best practices, including IAM roles and policies, KMS encryption, Secrets Manager, and least-privilege access principles.Strong problem-solving attitude, ownership mindset, and ability to operate effectively during production incidents.What’s nice to haveKnowledge of Kubernetes ecosystem components, including service meshes (e.g., Istio, Linkerd).Good knowledge of scripting and automation, preferably using Python.Good understanding of Infrastructure as Code (IaC) principles, preferably Terraform.Experience supporting enterprise-grade or mission-critical SaaS platforms.Experience with IaC, CI/CD, and DevOps tooling.Cloud or Kubernetes certifications, such as AWS Certified Solutions Architect Associate, AWS Certified SysOps Administrator, or Certified Kubernetes Administrator (CKA).What we offerOverIT is a unique transformation project in the SaaS space arena, full of ambition to scale and grow globally.International culture and environment with the opportunity to partner with an outstanding group of people and professionals who joined the company to scale and succeedA career-defining opportunity with full exposure to two leading private equity firms.At OverIT we value diversity and are committed to equal employment opportunities regardless of religion, age, disability, sexual orientation, gender perception or identity, ethnicity, or place of origin.
#J-18808-Ljbffr

More from OverIT - Field Service Management
OverIT - Field Service Management 4 days ago
OverIT - Field Service Management 1 day ago
OverIT - Field Service Management 2 days ago

Site Reliability Engineer, Cloud Operations

Apply Now
Back to search page