German Credit Risk Analysis

This repository contains an analysis of the German Credit Risk dataset. The objective of this project is to evaluate credit risk and predict whether a customer falls into the "good" or "bad" risk category using machine learning models.

📂 Project Structure

The repository is organized as follows:

.
├── data
│   ├── raw                # Raw data files (original dataset)
│   ├── processed          # Processed data files (encoded)
├── notebooks
│   ├── german_credit_risk_analysis.ipynb  # Jupyter notebook with the analysis
├── reports
│   ├── figures            # Visualizations generated during analysis
├── requirements.txt       # Python dependencies for the project
├── README.md              # Project overview (this file)

📊 Dataset Overview

The German Credit Risk dataset includes information on 1,000 customers with the goal of predicting their credit risk. Key features of the dataset include:

Target Variable: Risk - indicates whether the customer is a good or bad credit risk.
Input Variables: The selected attributes are:
1. Age (numeric)
2. Sex (text: male, female)
3. Job (numeric: 0 - unskilled and non-resident, 1 - unskilled and resident, 2 - skilled, 3 - highly skilled)
4. Housing (text: own, rent, or free)
5. Saving accounts (text - little, moderate, quite rich, rich)
6. Checking account (numeric, in DM - Deutsch Mark)
7. Credit amount (numeric, in DM)
8. Duration (numeric, in month)
9. Purpose (text: car, furniture/equipment, radio/TV, domestic appliances, repairs, education, business, vacation/others)

🔍 Analysis Steps

Data Preprocessing:
- Handling missing values in features like Saving accounts and Checking account.
Exploratory Data Analysis (EDA):
- Univariate analysis : plot, treemaps, and creation of categories for Purpose
- Bivariate analysis
- Overview : Pairplot
Encoding categorical variables for compatibility with machine learning algorithms:
- Encoding categorical data
- Correlation heatmap
Model Training and Evaluation:
- Splitting the dataset
- Standardization
- Models building:
  - Naive Bayes
  - k-Nearest Neighbors (KNN)
  - XGBoost (XGB)
- Metrics for evaluation:
  - Accuracy
  - F1-score
  - ROC-AUC
Model synthesis and conclusion:
- ROC Curve
- The XGBoost model achieved the best performance across all metrics, making it the recommended choice for deployment.

🧰 Installation

Clone this repository:

git clone https://github.com/clemcoste/german_credit_risk.git
cd german_credit_risk

Create venv and install the required dependencies:

python -m venv venv
pip install -r requirements.txt

Open the notebook on Visual Studio Code and select venv

📈 Figures

All visualizations and figures generated during the analysis are stored in the reports/figures directory. These include:

Feature distributions
Correlation heatmaps
Model performance comparisons

🛠️ Requirements

For the full list of dependencies, see the requirements.txt file.

Contributions are welcome! Feel free to submit issues or pull requests if you have suggestions for improvement.

📜 License

This project is licensed under the MIT License. See the LICENSE file for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

German Credit Risk Analysis

📂 Project Structure

📊 Dataset Overview

🔍 Analysis Steps

🧰 Installation

📈 Figures

🛠️ Requirements

📜 License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
notebooks		notebooks
reports/figures		reports/figures
README.md		README.md
requirements.txt		requirements.txt

clemcoste/german_credit_risk

Folders and files

Latest commit

History

Repository files navigation

German Credit Risk Analysis

📂 Project Structure

📊 Dataset Overview

🔍 Analysis Steps

🧰 Installation

📈 Figures

🛠️ Requirements

📜 License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages