Site Reliability Engineer (SRE) – AWS / Kubernetes / DevOps
Techdome · Remote
Experience: 3-7
Techdome is hiring a Site Reliability Engineer (SRE) in Hyderabad / Indore to ensure the availability, reliability, scalability, and performance of cloud-based production systems across our payments and platform products. This SRE / DevOps role focuses on automation, observability, CI/CD, incident management, and using AI tooling to reduce operational toil. Key Responsibilities Ensure high availability, performance, and scalability of production systems. Build automation for deployment, monitoring, and incident response. Implement observability: metrics, logging, tracing, and alerting (Prometheus, Grafana, ELK, Datadog). Define and manage SLIs, SLOs, and error budgets. Build and maintain CI/CD pipelines and infrastructure as code (Terraform, Ansible). Lead incident response, root cause analysis, and post-incident reviews. Perform capacity planning and cloud cost optimization. Participate in an on-call rotation for production support. Required Skills & Qualifications 3+ years experience as a Site Reliability Engineer, DevOps Engineer, or Platform Engineer. Cloud experience: AWS, GCP, or Azure. Containers and orchestration: Docker, Kubernetes. Infrastructure as code: Terraform, Ansible. Scripting / programming: Python, Go, or Bash. Strong Linux, networking, and distributed-systems fundamentals. CI/CD pipeline experience (Jenkins, GitHub Actions, GitLab CI, or similar). Preferred Skills Experience using or building AI / LLM-powered tooling for ops automation, incident summaries, and alert triage. Payments / fintech production experience. Experience with SLO-driven reliability and on-call process improvement. Why Techdome Technology-driven company with 5+ years building products across industries, including payments and fintech. You’ll get genuine ownership, fast growth, and a collaborative team where your ideas count. Hiring Process Fast and transparent. We use JIA, our in-house AI hiring platform, to review every application consistently and respond within a working d