The Student Forecaster Model
As part of the Le Wagon Bootcamp, this was my final group project:
๐ง Proven Hypothesis
The academic performance of a student depends not only on their academic capabilities but also their socio-economic status.
๐ซ Imagine thisโฆ
Youโre a headteacher managing a school of thousands of students. How do you create an ideal environment where every student can thrive?
๐ฏ Project Goal
To build a machine learning model that forecasts academic outcomes using a dataset of Portuguese students.
The objective was to uncover socio-economic drivers influencing final grades (G3) and offer a predictive dashboard for education leaders.
๐ Try the app: The Student Forecaster Model
๐ Data Exploration
We started with a raw dataset and trimmed irrelevant or misleading variables like:
- Home address
- Parent jobs (too many โotherโ values)
- Nursery attendance
- G1/G2 (used G3 only as target)
We then grouped related features:
- Alcohol intake (weekday/weekend)
- Family dynamics
- Time management
- Educational support
- Reason for school choice
- Parental education
Tools used:
- Heatmaps
- Boxplots
- Group binning
- Value counts
- Correlation analysis
โ๏ธ Model Training
After feature engineering and encoding:
- Split data into train/test sets
- Used Gradient Boosting Classifier
- Preprocessing: scaling + one-hot encoding
We tested multiple models for comparison:
Model | Precision | Test vs Train | Overfitting |
---|---|---|---|
Logistic Regression | 0.75 | 0.72 vs 0.72 | โ |
KNN | 0.76 | 0.67 vs 0.83 | โ |
Random Forest | 0.81 | 0.70 vs 0.98 | โ |
XGBoost | 0.82 | 0.68 vs 0.95 | โ |
Gradient Boosting | 0.76 | 0.70 vs 0.78 | โ Slight |
๐งช Final Metrics (on test data)
- ๐ฏ Accuracy:
0.96
- ๐ Precision:
0.95
- ๐ Recall:
0.99
- ๐งฎ F1 Score:
0.97
These metrics show strong performance on the binary classification task: predicting whether a student would pass or fail based on inputs.
๐ Insights
- Students who had failed before are more likely to receive support and improve.
- Motherโs education had stronger correlation than fatherโs.
- Study time and school choice were significant drivers.
- Most students want higher education, causing class imbalance.
๐ง Tools Used
- Python:
pandas
,seaborn
,scikit-learn
,joblib
- Web app: Streamlit
- Deployment: Streamlit Cloud
๐ App Link
๐ student-forecaster.streamlit.app
Additional Resources ๐
For a detailed explanation:
- ๐ Data Analysis & Insights โ View the presentation slides
- ๐ง Model Training & Evaluation โ Explore the Jupyter Notebook
- ๐ป Codebase & App โ Visit the GitHub repository
๐ฌ Final Thoughts
This project brought together all the fundamentals of data science โ from data wrangling and EDA to machine learning and user deployment โ to solve a real-world education problem.
Letโs build data tools that truly support educators and learners alike. ๐๐