Skip to content

clemcoste/german_credit_risk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

German Credit Risk Analysis

This repository contains an analysis of the German Credit Risk dataset. The objective of this project is to evaluate credit risk and predict whether a customer falls into the "good" or "bad" risk category using machine learning models.

📂 Project Structure

The repository is organized as follows:

.
├── data
│   ├── raw                # Raw data files (original dataset)
│   ├── processed          # Processed data files (encoded)
├── notebooks
│   ├── german_credit_risk_analysis.ipynb  # Jupyter notebook with the analysis
├── reports
│   ├── figures            # Visualizations generated during analysis
├── requirements.txt       # Python dependencies for the project
├── README.md              # Project overview (this file)

📊 Dataset Overview

The German Credit Risk dataset includes information on 1,000 customers with the goal of predicting their credit risk. Key features of the dataset include:

  • Target Variable: Risk - indicates whether the customer is a good or bad credit risk.
  • Input Variables: The selected attributes are:
    1. Age (numeric)
    2. Sex (text: male, female)
    3. Job (numeric: 0 - unskilled and non-resident, 1 - unskilled and resident, 2 - skilled, 3 - highly skilled)
    4. Housing (text: own, rent, or free)
    5. Saving accounts (text - little, moderate, quite rich, rich)
    6. Checking account (numeric, in DM - Deutsch Mark)
    7. Credit amount (numeric, in DM)
    8. Duration (numeric, in month)
    9. Purpose (text: car, furniture/equipment, radio/TV, domestic appliances, repairs, education, business, vacation/others)

🔍 Analysis Steps

  1. Data Preprocessing:

    • Handling missing values in features like Saving accounts and Checking account.
  2. Exploratory Data Analysis (EDA):

    • Univariate analysis : plot, treemaps, and creation of categories for Purpose
    • Bivariate analysis
    • Overview : Pairplot
  3. Encoding categorical variables for compatibility with machine learning algorithms:

    • Encoding categorical data
    • Correlation heatmap
  4. Model Training and Evaluation:

    • Splitting the dataset
    • Standardization
    • Models building:
      • Naive Bayes
      • k-Nearest Neighbors (KNN)
      • XGBoost (XGB)
    • Metrics for evaluation:
      • Accuracy
      • F1-score
      • ROC-AUC
  5. Model synthesis and conclusion:

    • ROC Curve
    • The XGBoost model achieved the best performance across all metrics, making it the recommended choice for deployment.

🧰 Installation

  1. Clone this repository:
    git clone https://github.com/clemcoste/german_credit_risk.git
    cd german_credit_risk
    
  2. Create venv and install the required dependencies:
    python -m venv venv
    pip install -r requirements.txt
    
  3. Open the notebook on Visual Studio Code and select venv

📈 Figures

All visualizations and figures generated during the analysis are stored in the reports/figures directory. These include:

  • Feature distributions
  • Correlation heatmaps
  • Model performance comparisons

🛠️ Requirements

For the full list of dependencies, see the requirements.txt file.

Contributions are welcome! Feel free to submit issues or pull requests if you have suggestions for improvement.

📜 License

This project is licensed under the MIT License. See the LICENSE file for more details.

About

Predict if customers are risky or not for credit

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published