This repository analyzes telecommunication customer churn using a Logistic Regression model built on data from Kaggle. The project focuses on:
- Data Exploration and Cleaning: Cleaning, visualizing, and understanding the churn dataset.
- Feature Engineering: Creating and transforming features for better model performance.
- Multicollinearity Analysis: Identifying and addressing highly correlated features to improve model robustness.
- Influential Points Detection: Examining data points potentially impacting model predictions.
- Logistic Regression Model Development: Training and evaluating a Logistic Regression model for predicting customer churn.
- Data Source: Kaggle's telecommunication churn dataset.
- Model Accuracy: 81% accuracy on the test set.
- Methodology:
- Exploratory Data Analysis (EDA)
- Feature Engineering
- Multicollinearity Analysis (e.g., gvif)
- Influential Points Detection
- Logistic Regression Model Training and Evaluation
Feel free to fork and contribute to this project. Any insights or improvements are welcome!
Disclaimer: This is a basic implementation for educational purposes only. The code and approach may require further refinement for real-world deployment.