描述和要求
Welcome to Maxis, home to tinkerers, craftspeople, & storytellers. Our drive is to inspire everyone to make a better world through creative play. As the developer of The Sims, we create games and experiences for millions of people around the world and are always looking for new ways to inspire our players. Maxis is a place where you can do what you love and help others grow while doing it; a place where your unique voice can be heard and seen. We put creativity and inclusivity at the core of our work and don't settle for seeing the world only as it is, but strive to build a world as it could be.
The Sims is one of the top-selling PC game franchises of all time and The Sims 4 is by far our most successful Sims game. We have more than 70 million registered players of the base game. To date, The Sims 4 has shipped over 60 content packs and countless base game feature and quality of life updates.
The Site Reliability Engineer will report to the Development Director.
As a Site Reliability Engineer at Maxis, you will focus on designing, deploying, and operating resilient, secure, and globally scalable services in Google Cloud Platform (GCP), with Node.js, TypeScript, Kubernetes, Helm, and GitLab CI/CD.
Responsibilities:
Design, operate, and evolve cloud infrastructure to support millions of players worldwide.
Ensure security, governance, and cost efficiency across our GCP environments.
Develop and support scalable services using Node.js and TypeScript, deployed with Kubernetes and Helm.
Improve best practices for CI/CD pipelines, observability, incident response, and live service operation.
Troubleshoot and improve live services for performance, scalability, and reliability.
Automate deployment, monitoring, and recovery strategies to maximise uptime.
Participate in on-call rotation, incident response, and root-cause analysis.
Collaborate with development teams during the game prototyping and iteration process.
Contribute to a culture of continuous improvement and knowledge sharing.
Qualifications:
Bachelor's/Master's degree in Computer Science, Software Engineering, or equivalent experience.
8+ years in SRE, DevOps, or cloud architecture, with hands-on experience running production services at scale.
Deep expertise with public cloud environments (GCP preferred, AWS/Azure also valuable).
Experience with Kubernetes and Helm for container orchestration and deployment.
Proficiency with CI/CD pipelines using GitLab CI.
Solid scripting/automation skills (TypeScript/Node, js Python or Bash).
Knowledge of observability tools (Prometheus, Grafana, Loki or Jaeger/Tempo).
Experience designing for high availability, reliability, and disaster recovery.