L2 Support Engineer (SRE Chaos Engineering)

4 - 8 years

5.0 - 15.0 Lacs P.A.

Chennai, Hyderabad

Posted:1 month ago| Platform: Naukri logo

Apply Now

Skills Required

VMwareChaos EngineeringLinuxTicketing ToolsBare MetalHarnessChaos BladeChaos monkeyLitmusGremlin

Work Mode

Work from Office

Job Type

Full Time

Job Description

L2 Support Engineer (SRE Chaos Engineering) Area: Private cloud VMware, OpenStack, Kubernetes Linux, Monitoring, Reliability Engineering Defining & implementing practices in Resiliency Engineering, Automation, Observability & Chaos Testing while also engraining a proactive Chaos Culture that thinks reliability first design Scope of work • Supervise a team of SREs, ensuring that production applications which team supports are stable, reliable, and well documented. Own end to end availability and performance of mission critical service. Contributing to the design/architecture of the system. Analyze system architectures to identify single points of failure and other areas that may present a resiliency deficiency. Develop software to automate chaos and resiliency test cases that simulate failures in a system that performs financial data processing. Integrate Chaos engineering with CI/CD process. Establish a process to define a hypothesis around a steady-state and to simulate real-world events. Executing Game Days on mission critical applications. Identification of top errors, reliability issues and driving root cause to avoid repeat of incidents. Ability to analyze and debug complex issues across tiers from frontend to mid-tier to infrastructure. Hands on experience on any Chaos tool (Harness, Litmus, Gremlin, Chaos monkey, and ChaosBlade). Mindset to identify and explore chaotic situations and conduct formalized experiments. Experience with monitoring and logging tools (e. g. Datadog, ELK, Prometheus, Grafana). Experience with Kubernetes and Docker. Deep understanding of SRE concepts like SLAs, SLOs, SLIs, and error budgets. Experience working on cross department efforts by communicating and negotiating with multiple teams to accomplish goals. Expert with troubleshooting issues and bugs. Programming experience (Python/Go/shell). Experience in financial domain (desirable). Prior SRE/DevOps experience desirable. Skill Set " Experience in OS platforms (windows, linux, centos, ubuntu etc., ) highly skilled Site Reliability Engineer to join our Technology team and will be working as part of a cross-functional product team to create elegant solutions to highly complex and intricate business challenges. Ability to prioritize and multitask. Excellent communication and interpersonal skills

Information Technology and Services
Chennai

RecommendedJobs for You

Pune, Bengaluru, Mumbai (All Areas)

Pune, Bangalore Rural, Mumbai (All Areas)

Pune, Noida, Mumbai (All Areas)