Skip to content

damzC/ml-demos

 
 

Repository files navigation

Title

Python Notebook for Predicting Football Match Outcomes

Introduction

This Python notebook provides a comprehensive solution for predicting soccer match outcomes using various machine learning models. It includes data preparation, feature selection, model training, evaluation, and explanation.

Features

  • Uses a variety of data sources, including CSV files and Google Sheets.
  • Performs extensive data cleaning and pre-processing.
  • Implements various feature selection techniques to identify the most relevant features for prediction.
  • Trains and evaluates four different classification models: Logistic Regression, Support Vector Machine (SVM), Random Forest, and XGBoost.
  • Provides detailed performance metrics, including accuracy, confusion matrix, and classification report.
  • Utilizes LIME (Local Interpretable Model-agnostic Explanations) to explain the predictions of the Random Forest and XGBoost models.
  • Demonstrates how to update the trained models incrementally using Dask-ML.

Feature Selection Techniques:

  1. High Correlation: Features with a high correlation coefficient (either positive or negative) are identified and considered for selection.
  2. Low Multicollinearity: Features that exhibit low multicollinearity are preferred to avoid redundancy and potential issues with model overfitting.
  3. High Chi-Square and F-Regression: Features with high chi-square and F-regression values indicate a strong association with the target variable and are considered for selection.
  4. Variance Inflation Factor (VIF): Features with a low VIF value (less than 10) are preferred to avoid multicollinearity issues.

Getting Started

  1. Clone this repository to your local machine.
  2. Install the required Python libraries from the requirements.txt file.
  3. Open the Football_Match_Outcome_Predictor.ipynb notebook in Google Colab or Jupyter Notebook.
  4. Follow the instructions in the notebook to prepare the data, train the models, and evaluate the results.

Usage

The notebook provides detailed instructions on how to use the various functions and methods for data preparation, model training, and evaluation.

Contributing

Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes.

License

This project is licensed under the MIT License.

Contact

For any questions or inquiries, please contact Arindam Chatterjee at [email protected].

About

Python code examples for the feedly Machine Learning blog (https://blog.feedly.com/category/all/Machine-Learning/)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.9%
  • Python 0.1%