Posted:2 months ago| Platform:
Work from Office
Full Time
The Site Reliability Engineer (SRE) ensures the availability, reliability, performance, and security of applications and infrastructure at the State Data Center (SDC). This role involves proactive monitoring, incident response, system optimization, and process improvements to maintain high service levels and compliance with security standards. The SRE will work closely with IT teams to enhance system resilience and efficiency. Roles and Responsibilities Implement infrastructure monitoring (CPU, Memory, Disk, Network) using Zabbix, Prometheus, Grafana, or ELK Stack. Monitor database performance (PostgreSQL, MySQL, Oracle DB) and recommend optimizations. Establish log aggregation and alerting mechanisms to detect anomalies. Generate uptime and SLA compliance reports for management review. Diagnose system and network issues, escalate as required, and track resolution. Maintain a ticketing system for issue documentation and trend analysis. Conduct root cause analysis (RCA) and implement preventive measures. Perform post-incident reviews (PIRs) to improve system resilience. Ensure high availability and failover readiness for critical services. Optimize database indexing, query performance, and backup strategies. Perform capacity planning to ensure systems can handle peak loads. Implement automated scaling and load balancing for performance optimization. Enforce access control policies, including firewalls, SSH restrictions, and IAM. Ensure timely patching and hardening of OS, middleware, and databases. Monitor for security vulnerabilities and implement necessary mitigations. Ensure compliance with government security policies (CERT-In, ISO 27001). Ensure real-time replication of databases to the disaster recovery (DR) site. Conduct regular failover testing to validate DR readiness. Maintain documentation and runbooks for disaster recovery scenarios. Maintain incident reports, troubleshooting guides, and standard operating procedures (SOPs). Track service-level agreements (SLAs) and prepare compliance reports. Develop training sessions for internal teams on monitoring tools and processes. Desired Skills/Background 5+ years of experience in SRE, IT Operations, or System Administration. Strong Linux (Ubuntu, RHEL, CentOS) and Windows Server knowledge. Experience with monitoring tools (Zabbix, Prometheus, Grafana, ELK, Splunk). Knowledge of networking, VPNs, firewalls, and load balancers. Familiarity with cloud services and on-premises infrastructure. Experience in database administration (PostgreSQL, MySQL, Oracle). Strong troubleshooting and incident management skills. AWS Certified SysOps Administrator, RHCE, ITIL, or Zabbix Certified Specialist. Experience working with State Data Centers (SDCs) and government IT projects.
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Bengaluru, Hyderabad
INR 3.5 - 8.5 Lacs P.A.
Mumbai, Bengaluru, Gurgaon
INR 5.5 - 13.0 Lacs P.A.
Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata
INR 3.0 - 7.0 Lacs P.A.
Chennai, Pune, Mumbai (All Areas)
INR 5.0 - 15.0 Lacs P.A.
Pune, Bengaluru, Mumbai (All Areas)
INR 11.0 - 21.0 Lacs P.A.
Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata
INR 15.0 - 16.0 Lacs P.A.
Pune, Bengaluru, Mumbai (All Areas)
INR 10.0 - 15.0 Lacs P.A.
Bengaluru, Hyderabad, Mumbai (All Areas)
INR 0.5 - 3.0 Lacs P.A.
Hyderabad, Gurgaon, Mumbai (All Areas)
INR 6.0 - 16.0 Lacs P.A.
Bengaluru, Noida
INR 16.0 - 22.5 Lacs P.A.