This project implements various machine learning algorithms to detect fraudulent transactions in online payments. The project explores different classification approaches to identify patterns and anomalies that might indicate fraudulent activity.
The project uses the Online Fraud Detection dataset (onlinefraud.csv
) which contains transaction records with the following features:
- Transaction type (transfer, cash_out, etc.)
- Amount
- Source and destination account details
- Transaction flags and indicators
- Transaction outcome (fraudulent or legitimate)
The project implements several machine learning algorithms to compare their performance in fraud detection:
- Main analysis notebook
- Data preprocessing and feature engineering
- Model training and evaluation
- Performance metrics comparison
- Implementation of KNN classifier
- Parameter tuning and optimization
- Performance evaluation
- SVM implementation for classification
- Kernel selection and parameter optimization
- Model evaluation
- Decision tree classifier implementation
- Tree visualization and interpretation
- Feature importance analysis
- Comparison of linear regression and random forest approaches
- Ensemble method implementation
- Performance analysis
.
├── onlinefraud.csv # Dataset file
├── paymentfraud.ipynb # Main analysis notebook
├── knn.ipynb # K-Nearest Neighbors implementation
├── svm.ipynb # Support Vector Machine implementation
├── decisiontree.ipynb # Decision Tree implementation
├── lr_rf.ipynb # Linear Regression and Random Forest implementation
└── README.md # Project documentation
The project requires the following Python libraries:
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
- jupyter
- Clone the repository
- Install the required dependencies
- Open the Jupyter notebooks to run the analyses
- Start with
paymentfraud.ipynb
for the main analysis - Individual algorithm implementations can be found in their respective notebooks
Each notebook contains detailed performance metrics including:
- Accuracy
- Precision
- Recall
- F1-Score
- ROC curves
- Confusion matrices
The models are evaluated using cross-validation to ensure robust performance assessment.