This Python notebook provides a comprehensive solution for predicting soccer match outcomes using various machine learning models. It includes data preparation, feature selection, model training, evaluation, and explanation.
- Uses a variety of data sources, including CSV files and Google Sheets.
- Performs extensive data cleaning and pre-processing.
- Implements various feature selection techniques to identify the most relevant features for prediction.
- Trains and evaluates four different classification models: Logistic Regression, Support Vector Machine (SVM), Random Forest, and XGBoost.
- Provides detailed performance metrics, including accuracy, confusion matrix, and classification report.
- Utilizes LIME (Local Interpretable Model-agnostic Explanations) to explain the predictions of the Random Forest and XGBoost models.
- Demonstrates how to update the trained models incrementally using Dask-ML.
- High Correlation: Features with a high correlation coefficient (either positive or negative) are identified and considered for selection.
- Low Multicollinearity: Features that exhibit low multicollinearity are preferred to avoid redundancy and potential issues with model overfitting.
- High Chi-Square and F-Regression: Features with high chi-square and F-regression values indicate a strong association with the target variable and are considered for selection.
- Variance Inflation Factor (VIF): Features with a low VIF value (less than 10) are preferred to avoid multicollinearity issues.
- Clone this repository to your local machine.
- Install the required Python libraries from the
requirements.txt
file. - Open the
Football_Match_Outcome_Predictor.ipynb
notebook in Google Colab or Jupyter Notebook. - Follow the instructions in the notebook to prepare the data, train the models, and evaluate the results.
The notebook provides detailed instructions on how to use the various functions and methods for data preparation, model training, and evaluation.
Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes.
This project is licensed under the MIT License.
For any questions or inquiries, please contact Arindam Chatterjee at [email protected].