- Plats: Austin
- Status: Texas
- Land: United States of America
- Startsida
- ...
- Lediga jobb
- Information om jobb
Description & Requirements
Software Engineer II – Site Reliability Engineering
Electronic Arts | Austin, USA
The Challenge Ahead
The IT Player Experience Engineering team builds and operates platforms that support millions of players worldwide. As a Software Engineer II – SRE, you will focus on improving the reliability, scalability, and operational excellence of Java-based, microservices-driven systems that power player experiences. This role is critical to delivering FY26 goals by embedding SRE best practices across design, development, and operations.
What You’ll Do
Reliability & Operations
Drive SRE initiatives to improve system availability, performance, and resilience across Java microservices
Define and track SLOs, SLIs, and error budgets for critical services
Lead incident response, root cause analysis (RCA), and postmortems to prevent recurrence
Automate operational tasks to reduce toil and improve system reliability
Observability
Design and implement monitoring, alerting, and logging strategies using industry-standard tools
Build end-to-end observability with metrics, distributed tracing, and logs for microservices
Tune alerts to reduce noise and ensure actionable signal during incidents
Engineering & Platform Enablement
Collaborate with development teams to build reliability into Java/Spring Boot services from design through production
Review service architecture for scalability, fault tolerance, and operability
Improve CI/CD pipelines with reliability, testing, and deployment safety checks
Support cloud-native deployments on AWS and containerized platforms (Docker/Kubernetes)
Best Practices & Enablement
Champion SRE best practices including automation, capacity planning, and resiliency testing
Contribute to runbooks, operational documentation, and knowledge sharing
Partner with engineers, product managers, and leadership to balance feature velocity with system reliability
What We’re Looking For
Core Skills
Strong experience with Java, Spring Boot, and microservices architectures
Hands-on experience with monitoring, alerting, logging, and distributed tracing
Experience supporting production systems with high availability and scale requirements
Cloud & Infrastructure
Experience with AWS services and cloud-native architectures
Familiarity with Docker, Kubernetes, and CI/CD pipelines
Reliability Mindset
Experience with incident management, on-call rotations, and post-incident analysis
Strong troubleshooting skills across application, infrastructure, and network layers
Collaboration
Ability to work closely with application engineers to influence design for reliability
Clear communication skills to explain operational risks and trade-offs