This project leverages NBA statistics to predict game outcomes using machine learning. The workflow integrates data scraping, feature engineering, and predictive modeling to provide insights into team performance and outcomes. The key focus is on analyzing historical performance, team metrics, and advanced statistics to make accurate predictions.
- Data Collection: Gather and preprocess historical NBA game data.
- Feature Engineering: Create game-specific features, advanced metrics, and rolling averages.
- Machine Learning: Train models like Random Forest and Ridge Regression for game outcome prediction.
- Evaluation: Assess model performance using classification and regression metrics.
Sources:
- Basketball Reference: Historical game data, box scores, and advanced metrics.
- NBA Stats API: Player and team statistics.
- ESPN: Injury reports and roster updates.
Data Extracted:
- Game Metadata: Date, season, location (home/away).
- Team Stats: Offensive/defensive ratings, average points, turnovers, and rebounds.
- Player Stats: Points, assists, rebounds, and player efficiency (PER).
- Advanced Metrics: Pace, eFG%, TS%, rolling averages, and game context.
Steps:
- Preprocessing: Clean data, handle missing values, and normalize features.
- Derived Features:
- Home/Away status
- Days of rest for each team
- Rolling averages for metrics like scoring trends and defensive efficiency
- Rivalry indicators and travel fatigue
- Target Variables:
- Total Score: Regression target for combined game scores.
- Winner: Classification target (binary: 1 = Win, 0 = Loss).
- Classification: Predict game winners using Random Forest, Logistic Regression, or Gradient Boosting.
- Regression: Predict total game scores using Ridge Regression, XGBoost, or Neural Networks.
- Split data into training (70%), validation (15%), and testing (15%) sets.
- Train models with key features like team stats, home-court advantage, and rolling averages.
- Evaluate performance using metrics like Accuracy, MAE, RMSE, Precision, Recall, and F1-Score.
- Languages: Python
- Libraries:
pandas
,BeautifulSoup
,sklearn
,nba-api
- Utilities: Selenium for dynamic scraping, Google ChromeDriver
- Environment: Python 3.10+
-
Install Python 3.10+.
-
Install Google Chrome and ChromeDriver.
brew install --cask chromedriver
- Clone the repository:
git clone htthttps://github.com/ddayto21/NBA-Time-Series-Forecastst.git
cd NBA-Game-Prediction
- Set up a virtual environment:
python3.10 -m venv venv
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Configure .env file:
Add the ChromeDriver path:
CHROMEDRIVER_PATH=/absolute/path/to/chromedriver
- Run the pipeline:
python main.py
- Train the model:
python src/training/train.py
- Add support for asynchronous scraping for faster data retrieval.
- Incorporate ensemble models like Random Forest or XGBoost for better predictions.
- Create visual dashboards for exploratory data analysis.
- Introduce real-time game prediction based on live data.
Contributions are welcome! Open an issue or submit a pull request to contribute to this project.
This project is licensed under the MIT License. See the LICENSE file for details.
- Data sourced from Basketball Reference, NBA Stats, and ESPN.
- Inspired by the rich history of the NBA and the power of data-driven analysis.