Descripción y requisitos
SEIII/SRE Engineer
Responsibilities:
You will build and operate distributed, large-scale, cloud-based infrastructure using modern open-source software solutions.
You will help build and operate a unified platform across EA, extract and process massive data from spanning 20+ game studios, and use the insight to serve massive online requests
You will use automation technologies to ensure repeatability, eliminate toil, reduce mean time to detection and resolution (MTTD & MTTR) and repair services.
You will perform root cause analysis and post-mortems with an eye towards future prevention.
You will design and build CI/CD pipelines.
You will create monitoring, alerting and dashboarding solutions that improve visibility into EA's application performance and business metrics.
You will produce documentation and support tooling for online support teams.
You will develop reporting systems that inform on important metrics, detect anomalies, and forecast future results
Develop and Operate both SQL and NoSQL solutions
You will build complex queries to solve data mining problems
You will develop large-scale online platform to personalize player experience and provide reporting and feedback
You will help in interviewing and hiring the best candidates for the team
You will help mentor the team members and help them grow in their skillsets
You will be responsible for driving growth and modernization efforts and projects for the team
Qualifications:
8 years of experience with Virtualization, Containerization, Cloud Computing (AWS preferred), VMWare ecosystems, Kubernetes, or Docker.
8 years of experience supporting high-availability production-grade Data infrastructure and applications with defined SLIs and SLOs.
Systems Administration or Cloud experience, including a strong understanding of Linux / Unix.
Network experience, including an understanding of standard protocols/components.
Automation and orchestration experience including Terraform, Helm, Chef, Packer.
Experience writing code in Python, Golang, or Java.
Experience with Monitoring tech stack like Prometheus, Grafana, Loki, Alertmanager
Experience with distributed system to serve massive concurrent requests
Experience working with large-scale systems and data platforms/warehouses