IT Infrastructure Support Site Reliability Engineer job opportunity at Astreya.



DatePosted 30+ Days Ago bot
Astreya IT Infrastructure Support Site Reliability Engineer
Experience: 6-years
Pattern: full-time
apply Apply Now
Salary:
Status:

Job

Copy Link Report
degreeOND
loacation Hyderabad, India, India
loacation Hyderabad, Ind..........India

About the Job We are seeking an experienced Site Reliability Engineer to join our IT Infrastructure Support team, responsible for ensuring the reliability, scalability, and performance of critical physical security infrastructure and supporting systems. In this role, you will combine software engineering expertise with operations knowledge to build and maintain automation tools, monitoring systems, and processes that support enterprise-grade server, network, and security device management. You will work closely with cross-functional teams to define and enforce service level objectives, reduce operational toil through automation, and drive continuous improvement in system resilience. This position requires 24x5 availability with on-call rotation to ensure uninterrupted support for mission-critical infrastructure. Key Responsibilities  Partner with leadership to establish, monitor, and enforce Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for infrastructure tooling, including configuration compliance rates, patch success rates, and deployment latency metrics.  Provide Level 3 expertise for tooling-specific incidents, focusing on automating incident remediation workflows and reducing Mean Time To Repair (MTTR) through intelligent automation and runbook development.  Identify and automate repetitive manual tasks across managed infrastructure, targeting measurable reductions in operational overhead (e.g., 50% reduction in manual server build time) through scripting and workflow automation.  Conduct thorough root cause analysis and lead blameless postmortems for all major service- impacting incidents, driving systemic improvements in tooling reliability and infrastructure resilience.  Engineer and maintain automated processes and scripts to populate, update, and synchronize asset management platforms (e.g., NetBox), configuration management databases, and monitoring systems for internal and external stakeholders.  Design, develop, and deploy full-stack applications, custom plugins, and automation scripts to extend functionality of management and monitoring systems, enabling direct device interaction for configuration management.  Develop and maintain fully automated Infrastructure-as-Code configurations for Windows and Linux server roles using tools such as Ansible, Terraform, or Puppet, including drift detection and auto-remediation capabilities.  Build end-to-end automation pipelines for vulnerability patching, security baseline enforcement (CIS benchmarks), and continuous compliance auditing against internal and regulatory standards for physical security devices.  Develop API-driven tools for network configuration management, automated firmware updates, pre/post-change validation, and real-time network health monitoring across the device fleet.  Deploy and standardize monitoring agents, centralized log collection systems, and custom dashboards with alerts based on critical SLIs (latency, error rate, saturation, traffic) for servers and edge devices.  Build automation scripts for intelligent ticket handling, problem validation, and escalation workflows within enterprise ticketing systems, ensuring 2-hour initial response SLAs are consistently met.  Participate in 24x5 on-call rotation to provide timely support for infrastructure systems, security devices, and related tooling, ensuring service continuity and rapid incident response. Required Skills  6+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering  Strong proficiency in Python, Bash, and PowerShell for automation scripting, with experience in Go for building high-performance backend services and APIs.  Hands-on experience with Infrastructure-as-Code tools (Terraform, Ansible, Chef, or Puppet) and configuration management practices, including drift detection, version control, and automated remediation.  Advanced knowledge of Linux and Windows server environments, including Tier 3 troubleshooting capabilities, system hardening, and enterprise-scale server management.  Solid understanding of enterprise networking concepts, Cisco device administration, network automation protocols (NETCONF/RESTCONF), and experience with network monitoring and flow analysis tools.  Experience implementing and managing monitoring solutions (Prometheus, Grafana, Datadog) and centralized logging platforms (ELK Stack), with ability to create custom dashboards and alerting rules.  Proficiency in implementing CI/CD pipelines, automated testing frameworks, and deployment strategies using modern DevOps tooling, with strong emphasis on code quality, security, and maintainability.

Other Ai Matches

Service Delivery Manager Applicants are expected to have a solid experience in handling Job related tasks
AV Service Delivery Manager Applicants are expected to have a solid experience in handling Job related tasks
Service Coordinator Applicants are expected to have a solid experience in handling Job related tasks
remote-jobserver Remote
Data Analyst II Applicants are expected to have a solid experience in handling Job related tasks
remote-jobserver Remote
Contracts Administrator I Applicants are expected to have a solid experience in handling Job related tasks
remote-jobserver Remote
Cloud Network Manager III Applicants are expected to have a solid experience in handling Job related tasks
Field Services Technician l Applicants are expected to have a solid experience in handling Job related tasks
Systems Administrator II Applicants are expected to have a solid experience in handling Job related tasks
Saleforce Production Support Engineer Applicants are expected to have a solid experience in handling Job related tasks
Data Center Infrastructure Engineer II Applicants are expected to have a solid experience in handling Job related tasks
remote-jobserver Remote
Data Analyst II Applicants are expected to have a solid experience in handling Job related tasks
remote-jobserver Remote
Service Desk Specialist I Applicants are expected to have a solid experience in handling Job related tasks
Infrastructure Design Engineer Applicants are expected to have a solid experience in handling Job related tasks
Incident Response Analyst II Applicants are expected to have a solid experience in handling Job related tasks
Service Desk Specialist II Applicants are expected to have a solid experience in handling Job related tasks
Data Center Engineer III Applicants are expected to have a solid experience in handling Job related tasks
Service Desk Specialist Applicants are expected to have a solid experience in handling Job related tasks
IT Infrastructure Installation Engineer Applicants are expected to have a solid experience in handling Job related tasks
remote-jobserver Remote
DevOps Engineer III Applicants are expected to have a solid experience in handling Job related tasks
remote-jobserver Remote
Network Engineer- Security & Compliance Applicants are expected to have a solid experience in handling Job related tasks
remote-jobserver Remote
BI/Reporting Data Analyst III Applicants are expected to have a solid experience in handling Job related tasks
Infrastructure Design Engineer Applicants are expected to have a solid experience in handling Job related tasks
Financial Analyst III Applicants are expected to have a solid experience in handling Job related tasks