DevOps Lead

15.0 - 20.0 years

9.0 - 13.0 Lacs P.A.

Thane

Posted:1 week ago| Platform: Naukri logo

Apply Now

Skills Required

AutomationData managementPerformance managementSOCDisaster recoveryISO 27001Distribution systemMonitoringSQLPython

Work Mode

Work from Office

Job Type

Full Time

Job Description

Hiring a Senior DevOps Leader for a High-Scale, Multi-Cloud EnvironmentFinding the right Senior DevOps Leader for your organization, especially one with over 15 years of experience and a background in high-scale operations leveraging GitLab, Kubernetes, GCP, and AWS, is a critical undertaking This role demands a unique blend of deep technical expertise, strategic thinking, and proven leadership capabilities Here s a comprehensive guide to what you should be looking for:Key Responsibilities to Expect:A Senior DevOps Leader in this context will be responsible for more than just managing infrastructure; they will be a strategic partner driving efficiency, innovation, and reliability across the organization * Strategic Leadership & Vision: * Defining and executing a long-term DevOps strategy aligned with business objectives, particularly for high-scale and resilient systems * Driving the adoption of DevOps best practices, tools, and culture across engineering and operations teams * Leading architectural decisions for CI/CD, containerization, cloud infrastructure, and automation, ensuring scalability, security, and cost-effectiveness * Evaluating and integrating new and emerging technologies (e g , AI in DevOps, advanced monitoring solutions) to enhance operational efficiency and system performance * Team Leadership & Development: * Building, mentoring, and leading a high-performing team of DevOps engineers * Fostering a collaborative, innovative, and continuous improvement culture within the DevOps team and its interactions with other departments * Managing resource allocation, project prioritization, and performance management for the DevOps team * Technical Oversight & Execution: * Overseeing the design, implementation, and management of robust CI/CD pipelines using GitLab CI * Leading the strategy and governance for Kubernetes deployments at scale, including cluster management, networking, security, and resource optimization across GCP (GKE) and AWS (EKS) * Architecting and managing multi-cloud infrastructure (GCP and AWS), focusing on high availability, disaster recovery, security, and cost optimization * Championing Infrastructure as Code (IaC) practices using tools like Terraform or CloudFormation * Implementing and refining comprehensive monitoring, logging, and alerting strategies (e g , using Prometheus, Grafana, ELK Stack, CloudWatch, Google Clouds operations suite) to ensure system health and proactive issue resolution * Driving automation initiatives across all stages of the software development lifecycle * Collaboration & Communication: * Working closely with development, operations, security, and product teams to streamline workflows and ensure seamless delivery of software * Communicating effectively with executive leadership, stakeholders, and technical teams regarding DevOps strategy, project status, risks, and performance metrics * Championing and enforcing security best practices (DevSecOps) throughout the development lifecycle * Operational Excellence & Governance: * Establishing and tracking key DevOps metrics (e g , deployment frequency, lead time for changes, mean time to recovery (MTTR), change failure rate) * Ensuring compliance with industry standards and internal policies * Managing budgets and vendor relationships related to DevOps tools and cloud services Essential Technical Leadership Skills:Beyond hands-on proficiency, a leader must demonstrate strategic application and governance of these technologies * GitLab: * Strategic Implementation: Deep understanding of GitLabs full suite (beyond just CI/CD) for source code management, pipeline orchestration, security scanning, and package management in a large enterprise * Scalability & Performance: Experience in scaling GitLab infrastructure and optimizing its performance for a large number of users and projects * Automation & Integration: Proven ability to automate complex workflows and integrate GitLab with other development and operations tools * Kubernetes (K8s): * Large-Scale Cluster Management: Expertise in designing, deploying, and managing multiple large-scale Kubernetes clusters on both GCP (GKE) and AWS (EKS) This includes experience with cluster upgrades, multi-tenancy, and resource quotas * Advanced Networking & Security: In-depth knowledge of Kubernetes networking (e g , CNI, service mesh like Istio or Linkerd) and security best practices (e g , pod security policies, network policies, secrets management, RBAC) in a high-scale, multi-cloud environment * Ecosystem & Tooling: Familiarity with the broader Kubernetes ecosystem, including Helm for package management, Prometheus/Grafana for monitoring, and tools for logging and tracing * GitOps: Experience implementing GitOps principles for managing Kubernetes configurations and applications * Google Cloud Platform (GCP) & Amazon Web Services (AWS): * Multi-Cloud Strategy & Governance: Proven experience in developing and implementing multi-cloud strategies, including workload placement, data management, and consistent governance across GCP and AWS * Core Services Expertise: Deep understanding and experience with core compute, storage, networking, database, and security services on both platforms (e g , AWS EC2, S3, VPC, RDS; GCP Compute Engine, Cloud Storage, VPC, Cloud SQL) * Infrastructure as Code (IaC): Mastery of IaC tools like Terraform (preferred for multi-cloud) or CloudFormation (AWS-specific) for provisioning and managing infrastructure in both clouds * Cost Optimization & Management: Demonstrable experience in implementing cost optimization strategies and managing budgets effectively across both GCP and AWS at scale * Security & Compliance: Expertise in designing and implementing secure cloud architectures, adhering to compliance standards (e g , SOC 2, ISO 27001, HIPAA if applicable) on both platforms * Migration Experience: Experience leading large-scale migrations to or between cloud platforms is highly desirable * General DevOps & SRE Principles: * Automation: A strong automation mindset with proficiency in scripting languages (e g , Python, Bash, PowerShell) * Monitoring, Logging, and Observability: Experience designing and implementing comprehensive observability solutions for large-scale distributed systems * Site Reliability Engineering (SRE): Understanding and application of SRE principles for availability, reliability, performance, and incident response

Advertising Services
Mumbai Maharashtra +27

RecommendedJobs for You

Mumbai, Hyderabad, Bengaluru

Kochi, Thiruvananthapuram

Kochi, Bhubaneswar, Hyderabad, Pune, Bengaluru, Delhi / NCR

Kolkata, Mumbai, New Delhi, Hyderabad, Pune, Chennai, Bengaluru