Sr . Engineering Manager - SRE

10 - 15 years

40.0 - 50.0 Lacs P.A.

Bengaluru

Posted:2 months ago| Platform: Naukri logo

Apply Now

Skills Required

Hospitalityoperational supportTeam managementNetworkingPerformance managementIncident managementmicrosoftAnalyticsFinancial servicesCapacity planning

Work Mode

Work from Office

Job Type

Full Time

Job Description

We are looking for a Sr. Engineering Manager - SRE to oversee the stability, scalability, and delivery of our production environment, leveraging software engineering principles and automation to improve cloud infrastructure management and reduce operational costs. This role will play a key part in transitioning from manual processes to automated solutions by leading our current DevOps teams: Cloud Infra Lifecycle Management Team: Focused on automated provisioning, capacity planning, and maintenance across all cloud platforms for production applications. Cloud Infra Support Team: Responsible for supporting internal users with production and development environment requests, with a long-term goal of eliminating manual intervention through automation. This role is ideal for a leader with a deep understanding of Azure cloud environments, SRE best practices, and a strong background in building automation-first operational models. Key Responsibilities: Stability, Scalability Availability: Lead the design and implementation of strategies to ensure high availability, reliability, and performance of production systems. Apply lifecycle management techniques, including monitoring, capacity planning, and automated scaling, to cloud environments. Establish Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for critical applications. Cloud Lifecycle Management: Oversee the Cloud Infra Lifecycle Management Team to build scalable, automated cloud provisioning workflows and optimize capacity. Implement infrastructure-as-code (IaC) practices using tools like Terraform, PowerShell, and Azure Resource Manager (ARM) templates. Ensure efficient cloud resource utilization and cost management strategies. Cloud Support Operations: Manage the Cloud Infra Support Team responsible for handling internal user requests related to production and development environments. Develop efficient workflows for incident response and request resolution, with automation as the default approach. Work towards eliminating the need for manual support teams by creating self-service solutions for internal users. Automation Transformation: Lead the transition of manual processes to cloud automation through training, upskilling, and process reengineering. Champion the use of automation to handle repetitive operational tasks, including monitoring, remediation, and deployments. Foster a "first principles thinking" culture focused on engineering excellence and process simplification. Monitoring Incident Response: Build robust monitoring systems using Azure Monitor, Log Analytics, and Application Insights for proactive performance management. Oversee incident response processes, ensuring rapid recovery and root cause analysis for production disruptions. Implement disaster recovery and high-availability strategies across environments. Security Compliance: Ensure all environments follow cloud security best practices, regulatory compliance, and corporate governance policies. Manage identity and access controls, network security, and risk mitigation strategies. Continuous Improvement: Drive ongoing improvements in system resilience, operational efficiency, and service quality through automation and best practices. Conduct regular performance reviews and capacity planning exercises to maintain optimal system health. Team Leadership Development: Provide coaching and mentorship to the SRE team, fostering a culture of continuous learning and technical excellence. Lead efforts to upskill the team in cloud scripting, automation development, and site reliability best practices. Reporting Metrics: Maintain detailed operational documentation and generate regular reports on system performance, reliability improvements, and cost efficiency efforts. Basic Qualifications: 10+ years of experience in cloud operations or SRE, with a strong focus on Azure environments. Extensive experience in managing and optimizing Azure services like Virtual Machines, App Services, SQL Database, Networking, and Storage. Hands-on expertise with cloud automation and IaC tools (Terraform, PowerShell, ARM templates, or Azure Automation). Strong understanding of SRE principles, including error budgets, SLOs, SLIs, and incident management practices. Proficiency with Azure DevOps and CI/CD pipeline management. Expertise in cloud cost management and optimization. Familiarity with monitoring, logging, and observability tools (e.g., Azure Monitor, Log Analytics, Security Centre). Knowledge of Azure security practices, including identity and access management, firewalls, and compliance requirements. Preferred Qualifications: Microsoft Certified: Azure Solutions Architect Expert or Azure Administrator Associate. Experience managing hybrid or multi-cloud environments. Experience implementing self-service workflows and internal user support automation. Soft Skills: Strong leadership and team management abilities. Excellent communication and client engagement skills. Analytical mindset with a proactive approach to problem-solving. Ability to handle high-pressure situations with professionalism.

Digital Services
Mumbai

RecommendedJobs for You

Chennai, Pune, Mumbai, Bengaluru, Gurgaon

Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata

Pune, Bengaluru, Mumbai (All Areas)