콘텐츠로 건너뛰기

일반 정보

지역: Hyderabad, Telangana, India 
역할 ID
211515
근로자 유형
Regular Employee
스튜디오/부서
CT - IT
유연근무제
Hybrid

설명 및 참여 요건

Electronic Arts는 전 세계 플레이어와 팬들에게 영감을 불어넣을 차세대 엔터테인먼트 경험을 제작합니다. 여기에선 모든 이가 이야기의 일부가 됩니다. 전 세계를 연결하는 커뮤니티의 일부이자 창의력이 번창하고 새로운 관점을 제시하며 아이디어가 중요한 곳이며 모두가 플레이 제작에 참여할 수 있는 팀입니다.

Senior SE I / Site Reliability Engineer (SRE) 


Job Description

We are seeking an accomplished Senior Site Reliability Engineer (SRE) with 12–15 years of experience to lead the reliability, scalability, and performance engineering of our critical infrastructure and production systems. As a Senior SRE, you will play a strategic and technical leadership role — driving reliability practices, mentoring SRE teams, and influencing the adoption of automation, observability, and resilience engineering across the organization.

You will act as a technical thought leader and hands-on engineer, collaborating with infrastructure, application, and operations teams to build, automate, and scale reliable systems that support global business operations. This role requires deep expertise in cloud platforms, automation, monitoring, incident management, and system design for large-scale distributed environments.




Roles & Responsibilities

1. Reliability Engineering & Automation

  • Architect, implement, and manage resilient, scalable, and highly available infrastructure systems.

  • Lead initiatives to automate manual operations, deployment, and monitoring processes to improve reliability and reduce toil.

  • Drive the creation of observability solutions and dashboards to proactively detect and remediate potential issues.

2. Incident & Problem Management

  • Lead critical incident response, ensuring swift mitigation and clear communication to stakeholders.

  • Conduct detailed root cause analysis (RCA) and drive permanent corrective actions to prevent recurrence.

  • Implement and mature incident management frameworks, including runbooks, playbooks, and post-incident reviews.

3. Infrastructure Operations & Performance Optimization

  • Oversee system performance, capacity planning, and scalability of infrastructure across hybrid and cloud environments (AWS, Azure, GCP).

  • Optimize system resource utilization, latency, and reliability through performance tuning and automation.

  • Work closely with architecture and platform teams to accommodate growth, change, and modernization initiatives.

4. Leadership & Mentorship

  • Provide technical leadership and mentorship to SRE teams and cross-functional engineering groups.

  • Promote an SRE culture across teams — championing principles of reliability, automation, observability, and continuous improvement.

  • Drive collaboration between development, QA, DevOps, and release teams to embed reliability into the software development lifecycle (SDLC).

5. Service Level Management

  • Define, track, and continuously improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs).

  • Apply the Four Golden Signals of SRE monitoring — Latency, Traffic, Errors, and Saturation — to guide system health and performance strategies.

6. Documentation & Knowledge Sharing

  • Establish and maintain comprehensive documentation of systems, operational procedures, and best practices.

  • Facilitate learning through technical sessions, blameless postmortems, and cross-team knowledge sharing.

7. Strategic Technology & Continuous Improvement

  • Contribute to defining the long-term SRE strategy, tooling roadmap, and automation frameworks.

  • Evaluate and adopt emerging technologies, tools, and methodologies to enhance system reliability and efficiency.

  • Partner with business and technical leaders to ensure alignment of SRE objectives with organizational goals.

8. Security & Compliance

  • Collaborate with security and compliance teams to ensure infrastructure, systems, and operations meet organizational and regulatory standards.

  • Implement secure configuration baselines, vulnerability remediation, and access control policies.

  • Integrate security practices into CI/CD pipelines to ensure DevSecOps alignment.

9. Strategic Leadership & Stakeholder Management

  • Partner with executive and business stakeholders to align SRE initiatives with enterprise objectives and risk frameworks.

  • Provide data-driven insights on reliability, capacity, and operational performance to influence strategic decision-making.

  • Represent SRE functions in technical governance forums, audits, and architecture reviews to drive reliability-focused outcomes.



Qualifications

  • Education: Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field.

  • Experience: 12–15 years of total IT experience, with at least 8+ years in SRE, DevOps, or large-scale systems engineering.

  • Technical Expertise:

    • Strong proficiency in Linux/Unix system administration and internals.

    • Proven experience in cloud platforms — AWS, Azure, or GCP.

    • Advanced scripting and automation skills using Python, Go, PowerShell, or Bash.

    • Hands-on exposure to containerization and orchestration technologies (Docker, Kubernetes) and expertise on service mesh like istio etc

    • Deep understanding of monitoring and observability stacks (Prometheus, Grafana, ELK, Datadog, Splunk, Zabbix, Nagios).

    • Expertise in configuration management and IaC tools (Ansible, Terraform, Chef, Puppet).

    • Strong knowledge of networking, load balancing, databases, and distributed systems.

  • Operational Excellence:

    • Hands-on experience in incident response, problem management, and capacity planning at enterprise scale.

    • Proven ability to design for reliability, redundancy, and disaster recovery.

  • Soft Skills:

    • Excellent analytical, communication, and leadership abilities.

    • Proven track record of mentoring and developing high-performing engineering teams.

    • Strong stakeholder management and cross-functional collaboration skills.




Nice to Have

  • Experience defining and implementing SRE frameworks or centers of excellence in global organizations.

  • Familiarity with REST API development, integration, and database query optimization.

  • Strong understanding of governance, risk, and compliance frameworks.

  • Experience with AIOpsself-healing systems, or machine learning-driven monitoring.

  • Demonstrated experience in driving organizational culture change toward reliability and automation.

  • Active participation in industry forums or open-source contributions related to DevOps or SRE practices.



Electronic Arts 소개
EA는 전 세계의 다양한 게임과 경험, 지역, 그리고 기회에 대한 광범위한 포트폴리오를 보유함에 있어 자랑스럽게 생각합니다. 당사는 적응력, 회복력, 창의성, 호기심을 중시합니다. 잠재력을 발휘하는 리더십부터 학습과 실험을 위한 공간을 만드는 것까지, 당사는 여러분이 훌륭한 일을 하고 성장의 기회를 추구할 수 있도록 힘을 실어드립니다.

EA는 신체적, 정서적, 재정적, 직업적, 지역 사회 복지를 강조하는 복리후생 프로그램으로 균형 잡힌 삶을 지원합니다. 당사의 패키지는 지역적 필요에 따라 맞춤형으로 구성되어 있으며, 의료 보험, 정신 건강 지원, 퇴직 연금, 유급 휴가, 가족 휴가, 무료 게임 등이 포함될 수 있습니다. 당사는 팀이 항상 최선을 다할 수 있는 환경을 육성합니다.

Electronic Arts는 동등한 고용 기회를 제공합니다. 채용에 관한 모든 결정은 인종, 피부색, 출신 국가, 혈통, 성별, 성 정체성 또는 성 표현, 성적 성향, 나이, 유전 정보, 종교, 장애, 질병, 임신, 결혼, 가족 상황, 군 복무 여부 또는 기타 법으로 보호되는 기타 특성과 관계없이 내려집니다. 당사는 또한 해당 법률에 따라 전과 기록이 있는 자격을 갖춘 지원자도 채용 대상으로 고려합니다. 또한, EA는 관련 법률에서 요구하는 대로 장애가 있는 자격을 갖춘 개인을 위한 직장 내 편의 시설을 마련합니다.