Site Reliability Engineer - Configuration Management Tools

8.0 - 13.0 years

8.0 - 12.0 Lacs P.A.

Mumbai, Chennai

Posted:1 week ago| Platform: Naukri logo

Apply Now

Skills Required

Site Reliability EngineeringConfiguration ManagementAnsibleBitbucketBashPuppetPython

Work Mode

Work from Office

Job Type

Full Time

Job Description

Notice period : Immediate to 30 days max Responsibilities of Senior SRE : - The Site Reliability Engineering (SRE) team is responsible for the reliability, scalability, stability and performance of systems and services. - They work with cross-functional teams to design, build and maintain systems and they troubleshoot issues when they arise. They bridge the gap between development and operations teams. - They work closely with business teams to define Service Level Objectives (SLO) and agreements (SLA) of critical systems. They also monitor and maintain the uptime of these systems in-line with the defined SLO's and SLA's. - They deploy and manage monitoring tools to gain insights on system health and performance. - They analyze performance, identify bottlenecks and implement solutions to improve a system's scalability and latency durations. - They develop scripts, implement tools and automation frameworks to reduce the manual intervention efforts of deployment, monitoring and scaling. - They work with development teams for design and development of observability practices like logging, metrics, tracing, etc. They aim to diagnose and troubleshoot issues proactively. - They create actionable alerts on monitoring systems to ensure rapid response for potential production incidents. - They forecast resource needs and provision adequately for current and future demand. - They design and execute "chaos experiments" to test system's failure resiliency. - They own, define and implement the Disaster Recovery (DR) processes for systems. They also conduct planned and unplanned mock DR drills to test for response preparedness during production incidents. - They ensure that security best practices are followed and implemented during design and operations of systems. - They also own and maintain documentation of processes, playbooks, and systems. - They publish KPI reports and other system health updates on a regular basis to the business. Requirements : - Must-have - Bachelor's degree, preferably in CS or a related field, or equivalent experience - Must-have - 12+ years of overall IT experience - Must-have - 7+ years of proven work experience as a Senior Site Reliability Engineer or a similar position. - Must-have - 5+ years of AWS Cloud experience with AWS Certified DevOps Engineer or SysOps or Security etc. - Must-have - AWS experience - 3+ years' experience with using a broad range of AWS technologies (e.g. EC2, RDS, ELB, S3, VPC, CloudWatch & Monitoring Tools) to develop and maintain an Amazon AWS based cloud solution, with an emphasis on best practice cloud security. - Must-have - 2+ years of experience in CDN and/or Cache systems like Fastly, Akamai, CloudFront, etc. - Proven Understanding & strong experience with Cloud deployments ( AWS / Docker/ Kubernetes) - Knowledge on provisioning IAC Tools like Terraform, Chef, Ansible, Shell, groovy, python, etc. - Experience with monitoring systems such as CloudWatch, NewRelic, Datadog/Splunk, ELK stack. - Experience managing cloud network resources (AWS Preferred) such as CloudWatch, VPC, URL proxies, private link, DNS, ACLs, firewalls, and C2S access points. - Platform or Application Engineering and Operational Knowledge in any of the CI/CD tooling like GitHub Actions, Jenkins, etc. - Experience in other tooling Technologies like JIRA, Bitbucket, Jenkins, Fortify, SonarQube, Nexus, Nexus IQ - Experience with configuration automation tools like Puppet/Ansible/Chef/Salt - Scripting Skills : Strong scripting (e.g. Bash & Python) and automation skills. - Operating Systems : Windows and Linux system administration. - Problem Solving : Ability to analyze and resolve complex infrastructure resource and application deployment issues - Strong attention to detail. Excellent verbal and written communication skills. Strong documentation skills. Good To Have : - Experience with Terraform/Ansible/Chef/Puppet - Experience with GitHub Actions - Experience with CloudFront, Fastly

RecommendedJobs for You