- Prywatny
- ...
- Oferty pracy
- Szczegóły stanowiska
Opis i wymagania
Description & Requirements
Electronic Arts creates next-level entertainment experiences that inspire players and fans around the world. Here, everyone is part of the story. Part of a community that connects across the globe. A place where creativity excels, new perspectives are invited, and ideas matter. A team where everyone makes play happen. Electronic Arts (EA) is looking for a Site Reliability Engineer (SRE) to join our GameKit Operations team. You will be part of a newly formed SRE function and help shape the future of how EA builds and operates its development platforms and services. If you're passionate about automation, observability, and improving service reliability at scale, we'd love to hear from you. You will report into a Senior Manager.
The work model for this role is a hybrid one, working 3 days per week from our office in Bucharest.
Job Requirements/Role
What You'll Do
In your first 60 days, gain an understanding of the GameKit environment and assess existing monitoring and observability systems.
By 90 days, begin implementing the observability roadmap, contribute to incident response, and identify opportunities to improve automation and reliability.
By 120 days, take ownership of main SRE plans, guide cross-team collaboration, and influence EA's approach to operational excellence.
Beyond 180 days, lead long-term strategies to improve reliability, mentor engineers, and champion sustainable and scalable engineering practices.
Main Responsibilities
Build scalable monitoring and observability systems using Prometheus/Grafana, Datadog, ELK, or similar.
Build infrastructure and tooling using technologies like Terraform, Ansible, AWS CloudFormation, and CI/CD pipelines (GitLab CI/CD).
Automate operational processes using Python and Bash to reduce manual toil and improve deployment reliability.
Operate and improve containerized applications using Kubernetes platforms (EKS, AKS, GKE).
Contribute to incident response processes and post-mortems, helping teams learn and improve from every incident.
What We're Looking For
Experience operating cloud platforms, especially AWS and Azure.
Expertise in monitoring, observability, and incident response at scale.
Hands-on experience with Infrastructure-as-Code and automation.
And desire to improve processes and team capabilities.
Comfortable working in dynamic environments and solving problems collaboratively.
3+ years of experience building SRE practices from the ground up.
Led on-call rotations or reliability-focused projects.
Mentored junior engineers and influenced engineering culture through documentation and collaboration.