As a member of our Production Support team, you’ll immediately put your love of technology into action. Each day, you and your team will be responsible for making sure our platforms, servers and networks are online and secure. You’ll work together, evaluating, selecting, implementing, integrating, maintaining, upgrading, documenting and designing our infrastructure. You’ll find new and creative solutions to troubleshoot and resolve issues. Communication is the key, both in problem solving with your supervisors and collaborating with your coworkers, as well as other teams in the network.
We are looking for a highly motivated individual that can utilize their software engineering skills to automate or eliminate operational tasks. The candidate will build and implement creative solutions to operational problems, including optimizing existing systems, building infrastructure, Capacity and Resilience management and eliminating work through automation. The candidate will partner with various cross-functional teams across the globe. The candidate will be responsible for maintain products SLI/SLO, availability, reliability, tooling and visualization for business, development, and operational teams to consume.
This position is anticipated to require the use of one or more High Security Access (HSA) systems. Users of these systems are subject to enhanced screening which includes both criminal and credit background checks, and/or other enhanced screening at the time of accepting the position and on an annual basis thereafter. The enhanced screening will need to be successfully completed prior to commencing employment or assignment
Develop tools and visualization to understand our customer experience and their product interaction
Run, maintain and improve the service against established Service Level Objectives by applying software engineering principles
Develop solutions to automate manual deployments & operational task.
Responsible for the availability, performance, change management, telemetry, and capacity management of their services
Engage in with the development team throughout the life cycle to help build for reliability
Take part in Root Cause Analysis and post-mortem to identify and eliminate gaps and improve service
Analyzes usage and telemetry data to identify patterns to predict and prevent failure
Constantly evaluate and test products specially before and after any change
Manage the efforts to split between manual operational work and engineering work
Part of the 24x7x365 support coverage
Experience with system administration in Windows, Unix, or Linux platforms
Experience with high level programing language such as Java, Python or C# and shell scripting
Strong experience with CI/CD pipeline and testing framework
Experience in Incident, Change and Problem management process in an large scale operations
Experience with integrating solutions in a multi-vendor environment, including SaaS environments
Experience in performance engineering and monitoring using tools such as AppDynamics, Splunk, Apica, Jmeter and Dynatrace
Experience with Automation and Configuration tools like Ansible, Puppet, Chef or Evolven.
Experience with Agile and full software development life cycle disciplines
Experience with Capacity and Resilience management practices and procedures is beneficial
Knowledge of networking protocols is beneficial.
Industry recognized security certifications (security, networking, etc.) – strongly preferred
Experience with Splunk in one of the following areas: IT Operations, compliance, Dev-Ops, network security, and system security, supporting security event management tools (SIEMs)
Working knowledge of Splunk Cloud solution offering – not required but preferred
Good working knowledge of Cloud Engineering. Understanding of private cloud principles and exposure to public cloud offerings such as AWS, Azure, Cloud Foundry or similar technology is preferred
Our CTC Production Support Organization is filled with innovators who love technology as much as you do. Together, you’ll use a disciplined, innovative and cost-effective approach to deliver a wide variety of high-quality products and services. You’ll work in a stable, resilient and secure operating environment where you—and the products you deliver—will thrive.