Posted:3 months ago| Platform:
Work from Office
Full Time
Lead NOC/Site Reliability Engineer Job ID: Lea-ETP-Pun-980 Location: Pune Lead NOC/Site Reliability Engineer Responsibilities 6+ years of experience in SRE, DevOps, or infrastructure management. Lead the NOC/SRE team from the front, ensuring a culture of proactive monitoring, rapid response, and continuous improvement. Act as the primary escalation point for major incidents, providing technical guidance and decision-making. Collaborate with DevOps, Engineering, and Product teams to enhance system reliability. Define best practices, incident response protocols, and runbooks for the team. Lead log tracing and deep troubleshooting for infrastructure, network, and application issues. Reduce MTTR (Mean Time to Resolution) and improve incident management processes. Expertise in troubleshooting complex infrastructure and application issues. Strong knowledge of log tracing, distributed tracing, and observability tools (e.g., ELK, Splunk, Grafana, Prometheus, OpenTelemetry). Deep understanding of SLAs, SLOs, and error budgets . Experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes, Docker). Good knowledge of Terraform, Kubernetes, Docker, and cloud architectures. Proficiency in monitoring and observability tools (New Relic, Prometheus, Datadog, etc.). Understanding of CI/CD pipelines, automation, and infrastructure as code (IaC). Basic scripting skills in Python, Go, Shell, or similar. Strong troubleshooting skills for complex distributed systems. Ability to mentor junior engineers and drive SRE best practices. Willingness to work in a 24 7 shift rotation and participate in on-call responsibilities. Strong problem-solving skills and ability to work in a fast-paced environment. Strong incident management, troubleshooting, and RCA skills. Qualifications 6+ years of experience in Site Reliability Engineering (SRE) / NOC / DevOps roles. Proven leadership experience, managing or mentoring a team. Hands-on experience with Terraform for Infrastructure as Code (IaC). Experience in Python for automation and scripting . Expertise in troubleshooting complex infrastructure and application issues. Strong knowledge of log tracing, distributed tracing, and observability tools (e.g., ELK, Splunk, Grafana, Prometheus, OpenTelemetry). Deep understanding of SLAs, SLOs, and error budgets. Experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes, Docker). Familiarity with CI/CD pipelines and GitOps practices. Strong problem-solving skills and the ability to make quick, data-driven decisions under pressure. Apply Now Return to search
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Bengaluru, Hyderabad
INR 3.5 - 8.5 Lacs P.A.
Mumbai, Bengaluru, Gurgaon
INR 5.5 - 13.0 Lacs P.A.
Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata
INR 3.0 - 7.0 Lacs P.A.
Chennai, Pune, Mumbai (All Areas)
INR 5.0 - 15.0 Lacs P.A.
Pune, Bengaluru, Mumbai (All Areas)
INR 11.0 - 21.0 Lacs P.A.
Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata
INR 15.0 - 16.0 Lacs P.A.
Pune, Bengaluru, Mumbai (All Areas)
INR 10.0 - 15.0 Lacs P.A.
Bengaluru, Hyderabad, Mumbai (All Areas)
INR 0.5 - 3.0 Lacs P.A.
Hyderabad, Gurgaon, Mumbai (All Areas)
INR 6.0 - 16.0 Lacs P.A.
Bengaluru, Noida
INR 16.0 - 22.5 Lacs P.A.