- Właściwość miejscowa: Austin
- Stan: Texas
- Kraj: United States of America
- Prywatny
- ...
- Oferty pracy
- Szczegóły stanowiska
Description & Requirements
The Challenge Ahead
The EA IT Player and Creator Experience team develops platforms and services that support player-facing experiences at scale. Our systems assist players throughout their support journey, help creators to share content that promotes our games, and help maintain safe online communities. We build low-latency, cloud-native, and AI-enabled applications that enhance Fan Care and Creator Engagement for a global audience.
As a Software Engineer - SRE, you will report to Sr Manager of Engineering. Your work will focus on improving the reliability, scalability, and operational excellence of micro frontend and microservices-driven systems that power player experiences. You will guide SRE programs to improve system availability, performance, and resilience across systems. You partner closely with engineers, product managers, and leadership to balance feature velocity with system reliability. This role is critical to delivering FY26 goals by advocating SRE best practices across design and operations.
The role is Hybrid in Austin (3 days in office).
Main Responsibilities:
- You define and track SLOs, SLIs, and error budgets for critical services
- You bring expertise to design, implement and optimize monitoring, alerting, logging, and distributed tracing solutions using industry-standard tools
- You bring experience with incident response, root cause analysis (RCA), and postmortem processes to prevent recurrence and improve system reliability
- You build end-to-end observability with metrics, distributed tracing, and logs for microservices
- You are skilled in automating operational tasks to reduce toil and enhance system performance, reliability and observability
- You are proficient in tuning alerts to minimize noise and ensure applicable signals during incidents
- You will review service architecture for scalability, fault tolerance, and operability
- You will help improve CI/CD pipelines with reliability, testing, and deployment safety checks
- You will support cloud-native deployments on AWS and containerized platforms (Docker/Kubernetes)
- You collaborate with development teams to build reliability into Frontend and Java/Spring Boot services from design through production.
- Additionally, you contribute to run-books, operational documentation, and knowledge sharing
Core skills:
- Demonstrated expertise in Java, Spring Boot, and microservices architectures (3+ years)
- Hands-on experience with AWS services and cloud-native architectures (2+ years), Familiarity with Docker, Kubernetes
- Experience improving CI/CD pipelines with reliability, testing, and deployment safety checks
- Ability to review service architecture for scalability, fault tolerance, and operability (3+ years)
- Reliability Mindset - 2+ years of experience with incident management, on-call rotations, and post-incident analysis
- Experience communicating updates, RCAs and resolutions to customers and other partners to explain operational risks and trade-offs