Principal DevOps Site Reliability Engineer

SCOTIA BANK (Toronto ON, Canada) 15 days ago
Remote Friendly DevOps & Sysadmin North America Full-time

Requisition ID: 98764
Join the Global Community of Scotiabankers to help customers become better off.

The team:

Scotiabank’s Global Technology Services (GTS) Technology Operations & Site Reliability Engineering (SRE) is responsible for the operations engineering required to provide highly available and resilient systems.  In GTS Enterprise Data Warehouse (EDW) & Reference Data Management (RDM) TechOps & SRE, we are responsible for providing critical data platform services, following SRE and data governance best practices, as well as consulting and coordinating with the Bank’s technology teams to meet business expectations.

The role:

  • You will build and manage the EDW & RDM SRE function with responsibility for the availability and performance of the ecosystem through automation, proactive alerts, and a strong data analytical tool set to identify areas of improvement and prevent problem recurrence.
  • You will partner with Engineering and Operations stakeholders to ensure a high-quality product is developed by providing to the overall solution design and engineering, operations, and release best practices within the SRE framework.
  • You will identify opportunities to improve data availability and ensure solution designs optimize data restoration, component and site recovery capabilities.
  • You will identify opportunities for quality improvements and design automation to prevent problem recurrence and reduce toil.
  • You will implement comprehensive service monitoring to ensure uptime and performance, including system, application performance, dashboards etc.
  • You will define, measure, and meet key Service Level Objectives and Indicators covering availability, performance, resilience, incidents and chronic problems.
  • You will work closely with Engineering and Operations to build and maintain Standard Operating Procedures and Run Books.
  • You will lead a team of Site Reliability Engineers and provide mentorship on SRE.
  • After-hours support is required for critical incidents and events.

Is this role right for you?

  • You are a skilled software engineer with a passion for operations who self-identifies as a “hacker”, who is a “jack of all trades” as well as possess deep knowledge in multiple areas of software development, Linux/AIX, networking, security, databases, and distributed systems.
  • You thrive on the challenge of supporting critical systems requiring a high level of trust, resilience, and availability.
  • You enjoy looking for opportunities to be proactive, automate, and solve for problems before they happen.
  • You want to be challenged with complex problem solving in time sensitive situations to reduce system downtime and customer impact, taking those learnings forward as continuous improvements.
  • You understand how the Bank’s risk appetite and risk culture should be considered in day-to-day activities and decisions.

Do you have the skills that will enable you to succeed in this role?

  • You have strong communication (verbal/written) and good interpersonal skills to build relationships with internal and external business partners and vendors.
  • You have a track record as a strong team player with a proven ability to lead an engineering operations/SRE team, setting standards, and providing solution designs for highly available and resilient systems.
  • You have at least 10+ years of hands-on technical working experience designing solutions for and supporting highly available systems, including high availability, disaster recover, and data restoration solutions.
  • You can demonstrate advanced technical knowledge on and exposure to the collection, parsing, and analysis of data related to system performance, availability, and resilience.
  • You have experience implementing, monitoring, and reporting on Service Level Objectives and Indicators.
  • You have hands-on technical working experience as a software engineer.  Experience with ETL languages including Informatica and SAS is an asset.
  • You have a solid understanding of and experience with security, firewalls, and network protocols.
  • You have hands on experience setting standards for and authoring Standard Operating Procedures and Run Books.
  • You possess superior problem solving and decision-making skills to resolve work issues with the ability to work under pressure in a dynamic environment
  • You are experienced at leading major incidents that cross application and technology boundaries
  • You have completed a post-secondary education in computer science, engineering or in a related technology field.

What's in it for you?

  • You will be a part of a Site Reliability Engineering team that will help you grow & you will have an opportunity to bring valuable and long-lasting contributions to the bank.
  • We are technology partners who help the business transform how our employees around the world work.
  • You'll get to work with and learn from diverse industry leaders, who have hailed from top technology companies around the world.
  • We have an inclusive and collaborative working environment that encourages creativity, curiosity, and celebrates success! We also foster an environment of innovation and continuous learning.
  • We care about our people, allowing them to design how they work to deliver amazing results.
  • We offer a competitive total rewards package, including a performance bonus, company matching programs (pension & Employee Share Ownership), generous vacation; health/medical/wellness benefits; employee banking privileges.
  • While we currently work remotely from home, when it is deemed safe to return physically to work, our primary location in downtown Toronto is:
    • Design focused on enabling collaboration through both environment and technology.
    • Located in the heart of Toronto’s financial district, the work site is located right above the TTC’s Line 1 King subway station. This location has access to The PATH & is located minutes from GO Transit/VIA Rail hub at Union Station; as well as the TTC’s King 504 streetcar line.
    • Minutes from the Gardiner Expressway & the DVP.
    • Located next door is The Commons, a dining space for employees, where breakfast & lunch are served.  Also, The Bean server hot/cold beverages & snacks with plenty of room to lounge & recharge.  Also, many meal/snack options + shopping & services for your everyday needs in The PATH without venturing outside.

Location(s):  Canada : Ontario : Toronto 

As Canada's International Bank, we are a diverse and global team. We speak more than 100 languages with backgrounds from more than 120 countries. Our employees are committed to a superior customer experience and use the Bank’s six guiding sales practice principles to ensure they act with honesty and integrity.
At Scotiabank, we value the unique skills and experiences each individual brings to the Bank, and are committed to creating and maintaining an inclusive and accessible environment for everyone. If you require accommodation (including, but not limited to, an accessible interview site, alternate format documents, ASL Interpreter, or Assistive Technology) during the recruitment and selection process, please let our Recruitment team know. If you require technical assistance, please  click here . Candidates must apply directly online to be considered for this role. We thank all applicants for their interest in a career at Scotiabank; however, only those candidates who are selected for an interview will be contacted.


Principal DevOps Site Reliability Engineer

Apply On Company Site
Back to search page
;