Posted:1 week ago| Platform:
Work from Office
Full Time
Hiring a Senior DevOps Leader for a High-Scale, Multi-Cloud EnvironmentFinding the right Senior DevOps Leader for your organization, especially one with over 15 years of experience and a background in high-scale operations leveraging GitLab, Kubernetes, GCP, and AWS, is a critical undertaking This role demands a unique blend of deep technical expertise, strategic thinking, and proven leadership capabilities Here s a comprehensive guide to what you should be looking for:Key Responsibilities to Expect:A Senior DevOps Leader in this context will be responsible for more than just managing infrastructure; they will be a strategic partner driving efficiency, innovation, and reliability across the organization * Strategic Leadership & Vision: * Defining and executing a long-term DevOps strategy aligned with business objectives, particularly for high-scale and resilient systems * Driving the adoption of DevOps best practices, tools, and culture across engineering and operations teams * Leading architectural decisions for CI/CD, containerization, cloud infrastructure, and automation, ensuring scalability, security, and cost-effectiveness * Evaluating and integrating new and emerging technologies (e g , AI in DevOps, advanced monitoring solutions) to enhance operational efficiency and system performance * Team Leadership & Development: * Building, mentoring, and leading a high-performing team of DevOps engineers * Fostering a collaborative, innovative, and continuous improvement culture within the DevOps team and its interactions with other departments * Managing resource allocation, project prioritization, and performance management for the DevOps team * Technical Oversight & Execution: * Overseeing the design, implementation, and management of robust CI/CD pipelines using GitLab CI * Leading the strategy and governance for Kubernetes deployments at scale, including cluster management, networking, security, and resource optimization across GCP (GKE) and AWS (EKS) * Architecting and managing multi-cloud infrastructure (GCP and AWS), focusing on high availability, disaster recovery, security, and cost optimization * Championing Infrastructure as Code (IaC) practices using tools like Terraform or CloudFormation * Implementing and refining comprehensive monitoring, logging, and alerting strategies (e g , using Prometheus, Grafana, ELK Stack, CloudWatch, Google Clouds operations suite) to ensure system health and proactive issue resolution * Driving automation initiatives across all stages of the software development lifecycle * Collaboration & Communication: * Working closely with development, operations, security, and product teams to streamline workflows and ensure seamless delivery of software * Communicating effectively with executive leadership, stakeholders, and technical teams regarding DevOps strategy, project status, risks, and performance metrics * Championing and enforcing security best practices (DevSecOps) throughout the development lifecycle * Operational Excellence & Governance: * Establishing and tracking key DevOps metrics (e g , deployment frequency, lead time for changes, mean time to recovery (MTTR), change failure rate) * Ensuring compliance with industry standards and internal policies * Managing budgets and vendor relationships related to DevOps tools and cloud services Essential Technical Leadership Skills:Beyond hands-on proficiency, a leader must demonstrate strategic application and governance of these technologies * GitLab: * Strategic Implementation: Deep understanding of GitLabs full suite (beyond just CI/CD) for source code management, pipeline orchestration, security scanning, and package management in a large enterprise * Scalability & Performance: Experience in scaling GitLab infrastructure and optimizing its performance for a large number of users and projects * Automation & Integration: Proven ability to automate complex workflows and integrate GitLab with other development and operations tools * Kubernetes (K8s): * Large-Scale Cluster Management: Expertise in designing, deploying, and managing multiple large-scale Kubernetes clusters on both GCP (GKE) and AWS (EKS) This includes experience with cluster upgrades, multi-tenancy, and resource quotas * Advanced Networking & Security: In-depth knowledge of Kubernetes networking (e g , CNI, service mesh like Istio or Linkerd) and security best practices (e g , pod security policies, network policies, secrets management, RBAC) in a high-scale, multi-cloud environment * Ecosystem & Tooling: Familiarity with the broader Kubernetes ecosystem, including Helm for package management, Prometheus/Grafana for monitoring, and tools for logging and tracing * GitOps: Experience implementing GitOps principles for managing Kubernetes configurations and applications * Google Cloud Platform (GCP) & Amazon Web Services (AWS): * Multi-Cloud Strategy & Governance: Proven experience in developing and implementing multi-cloud strategies, including workload placement, data management, and consistent governance across GCP and AWS * Core Services Expertise: Deep understanding and experience with core compute, storage, networking, database, and security services on both platforms (e g , AWS EC2, S3, VPC, RDS; GCP Compute Engine, Cloud Storage, VPC, Cloud SQL) * Infrastructure as Code (IaC): Mastery of IaC tools like Terraform (preferred for multi-cloud) or CloudFormation (AWS-specific) for provisioning and managing infrastructure in both clouds * Cost Optimization & Management: Demonstrable experience in implementing cost optimization strategies and managing budgets effectively across both GCP and AWS at scale * Security & Compliance: Expertise in designing and implementing secure cloud architectures, adhering to compliance standards (e g , SOC 2, ISO 27001, HIPAA if applicable) on both platforms * Migration Experience: Experience leading large-scale migrations to or between cloud platforms is highly desirable * General DevOps & SRE Principles: * Automation: A strong automation mindset with proficiency in scripting languages (e g , Python, Bash, PowerShell) * Monitoring, Logging, and Observability: Experience designing and implementing comprehensive observability solutions for large-scale distributed systems * Site Reliability Engineering (SRE): Understanding and application of SRE principles for availability, reliability, performance, and incident response
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Mumbai, Hyderabad, Bengaluru
INR 17.0 - 22.5 Lacs P.A.
Kochi, Thiruvananthapuram
INR 15.0 - 20.0 Lacs P.A.
INR 10.0 - 15.0 Lacs P.A.
Kochi, Bhubaneswar, Hyderabad, Pune, Bengaluru, Delhi / NCR
INR 14.0 - 24.0 Lacs P.A.
INR 6.0 - 10.0 Lacs P.A.
INR 10.0 - 15.0 Lacs P.A.
Hyderabad, Chennai, Bengaluru
INR 14.0 - 24.0 Lacs P.A.
INR 5.0 - 9.0 Lacs P.A.
Kolkata, Mumbai, New Delhi, Hyderabad, Pune, Chennai, Bengaluru
INR 6.0 - 11.0 Lacs P.A.