This project focuses on building a Machine Learning Model to predict whether employees are eligible for a promotion based on their performance, achievements, and other attributes. The solution is designed to help HR departments make data-driven decisions while ensuring fairness and efficiency.
- Develop a robust ML model to predict employee promotions.
- Handle class imbalance to ensure accurate predictions for underrepresented groups.
- Provide a detailed EDA (Exploratory Data Analysis) to understand key patterns and relationships in the data.
- Train models with optimized features for better interpretability and performance.
- Programming Language: Python 🐍
- Libraries:
- Data Processing:
pandas
,numpy
- Visualization:
matplotlib
,seaborn
- Machine Learning:
scikit-learn
,xgboost
- Oversampling:
imbalanced-learn
- Model Persistence:
joblib
- Data Processing:
EDA-and-predictor-for-Employee-Eligible-for-Promotion/
├── data/ # Contains datasets
│ ├── HRData.csv # Raw dataset
│ ├── HRData_cleaned.csv # Cleaned and preprocessed dataset
│ ├── hrdatatest.csv # Test dataset
│ └── predictions_hrdatatest.csv # Predictions output
├── models/ # Trained models and scalers
│ ├── model.py
│ ├── xgboost_model_smote.pkl
│ └── scaler.pkl
├── scripts/ # Scripts for each step of the pipeline
│ ├── data_preprocessing.py # Data preprocessing and cleaning
│ ├── Exploratory_Data_Analysis.py # Exploratory Data Analysis
│ ├── predict.py # Predictions on new data
├── README.md # Project documentation
└── requirements.txt # Python dependencies
- Visualizations included:
- Distribution of promotions.
- Correlation matrix.
- Boxplots (e.g., age vs promotion).
- Histograms (e.g., training scores).
- Promotion breakdown by department.
- Insights:
avg_training_score
,KPIs_met >80%
, andprevious_year_rating
are highly correlated with promotions.
- Handled missing values using
SimpleImputer
. - Converted categorical variables to numerical using
LabelEncoder
. - Standardized features using
StandardScaler
.
- Used XGBoost for classification.
- Addressed class imbalance with SMOTE.
- Trained the model with the full feature set and improved its precision and recall by balancing classes using SMOTE:
- Deployed a script to predict promotions on new datasets.
- Outputs predictions in
predictions_hrdatatest.csv
.
- Accuracy: 97%.
- Class
1
(Promoted):- Recall: 94%.
- Precision: 99%.
- F1-Score: 96%.
-
Set up the environment:
python -m venv venv source venv/bin/activate # On Windows: .\venv\Scripts\activate pip install -r requirements.txt
-
Preprocess the data:
python scripts/data_preprocessing.py
-
Perform EDA:
python scripts/Exploratory_Data_Analysis.py
-
Train the model:
python models/model_py
-
Make predictions:
python scripts/predict.py
Install all required libraries with:
pip install -r requirements.txt
- Predictions are saved in the
data/predictions_hrdatatest.csv
file. - Trained models and scalers are saved in the
models/
directory.
Contributions are welcome! Feel free to submit a pull request or raise an issue. Let's make this project even better!
For questions or feedback, please reach out to me at - LinkedIn Profile.