- Accueil
- ...
- Postes à pourvoir
- Détails du poste
Descriptions et critères
EA - Data & Insights, AI Analytics Engineering
Data Science Intern
Team: EA - Data & Insights, AI Analytics Engineering
Type: Internship (Full-time during term)
About the Team
EA is a global leader in digital interactive entertainment. The EA - Data & Insights, AI Analytics Engineering team plans, builds, and ships enterprise-grade data platforms, integrations, and analytics that power faster decision-making, unlock revenue opportunities, and improve business performance. We partner closely with product, engineering, and analytics teams across EA to deliver trusted, actionable insights.
Role Overview
You’ll join a hands-on, fast-moving team to build data and ML solutions with an emphasis on Python and software craftsmanship. You’ll write clean, well-tested code; wrangle large datasets; engineer features; train/evaluate models; and help move prototypes toward production in collaboration with senior engineers and architects.
What You’ll Do
- Build robust Python modules and notebooks for data ingestion, feature engineering, and model training (primarily with pandas, NumPy, and scikit-learn).
- Author clear, maintainable code using OOP, type hints, docstrings, and unit/integration tests; participate in code reviews and follow Git-based workflows.
- Explore datasets to define problem statements, create hypotheses, and conduct EDA with appropriate visualization and summary statistics.
- Implement and evaluate baseline and advanced ML models; select metrics, design experiments, and apply cross-validation.
- Apply solid SQL to extract/transform data; collaborate on building reliable data pipelines to support analytics and reporting use cases.
- Communicate results with crisp narratives, dashboards/plots, and reproducible notebooks; translate findings into product and business recommendations.
- Contribute to best practices in the team’s development lifecycle (automation, CI, documentation) and proactively suggest improvements.
Must‑Have Skills (Core Hiring Bar)
- Python mastery for data work: pandas, NumPy, scikit‑learn; writing reusable functions/classes; debugging and profiling; packaging basics.
- Strong coding fundamentals: data structures & algorithms, OOP, modular design, unit testing (pytest or similar), version control (Git), and code reviews.
- ML & DS foundations: supervised learning (linear/logistic regression, trees/ensembles), regularization, bias/variance, cross‑validation, feature scaling/encoding, and model evaluation (AUC/ROC, F1, RMSE/MAE, calibration).
- Statistics for data analysis: sampling, hypothesis testing, confidence intervals, distributions; ability to choose appropriate tests and interpret results.
- Solid SQL for data extraction/joins/aggregations and working knowledge of query optimization basics, along with proficiency in Git (GitHub/GitLab workflows, branching, pushing, merging).
- Data wrangling & EDA: handling missing/outliers, joins/pivots, time‑series/tabular transforms, clear visualizations (matplotlib/plotly) and narrative summaries.
- Problem solving & ownership: ability to define the problem, design experiments, deliver incremental value, and document decisions.
- Communication: concise written docs/notebooks and clear verbal explanations tailored to technical/non‑technical partners.
Good‑to‑Have Skills (Differentiators)
- Cloud & data platforms: exposure to Snowflake/BigQuery/Redshift; familiarity with AWS or Azure basics (e.g., S3/Blob, compute, IAM concepts).
- Pipelines & orchestration: experience with Airflow/Prefect or similar; understanding of batch vs. streaming concepts.
- Software craftsmanship extras: Makefiles/poetry/pip-tools, pre‑commit, linters/formatters, logging & observability, simple CLI tools.
- MLOps/productionization: model persistence (joblib/ONNX), reproducibility (seeds/environments), lightweight API serving (FastAPI/Flask), and tracking (MLflow/Weights & Biases).
- Advanced ML: gradient boosting (XGBoost/LightGBM/CatBoost), time‑series forecasting basics, recommendation, Neural Networks and NLP fundamentals.
- Big data: PySpark or Spark SQL for distributed transforms; understanding of partitioning and performance trade‑offs.
- Visualization & storytelling: dashboards in Plotly Dash/Streamlit; crafting stakeholder‑ready summaries.
- Competitive programming/problem-solving practice: experience with LeetCode, CodeChef, or similar platforms to strengthen algorithmic and coding proficiency.
- Other languages: basic R or SQL dialects; familiarity with JVM/C++/Scala is a plus.