This is a Mini-Project for the course SC1015 (Introduction to Data Science and Artificial Intelligence). The dataset used in this project is extracted from Customer Personality Analysis.
The source code, in order, are as follows:
- Data Exploration
- Data Cleaning
- Data Visualization
- Data Splitting
- Linear Regression
- Poisson Regression
- Lim Boon Hian - Data Exploration, Data Cleaning, Data Visualization
- Lim Dong Wan - Linear Regression, Data Splitting
- Marvell Ung Wew - Poisson Regression, Grid Search Cross Validation
Predicting customer sales using customer information
Response variables: TotalPurchase, MntGroceryProducts, MntWines, MntGoldProds
Predictor variables:
- Categorical:
- Education
- Marital_Status
- HaveChild
- YearRange
- Numerical:
- Income
- TotalChild
- NumWebVisitsMonth
- Linear Regression
- Poisson Regression
Main insight: The company utilizes customer-focused sales tactics with customer’s income as a guideline.
Recommendations:
- Target audience should be customers who have / are:
- fewer children
- higher income
- higher levels of education
- born in 1940 - 1950
- Improve online sales by upgrading company website, so as to make it more attractive and appealing to customers.
- Customer income serves as the most important predictor to predict customer expenditure
- Linear Regression yield better results than Poisson Regression
- GridSearch CV only marginally improved the Linear Regression model
- K-fold splitting
- Poisson Regression
- Plotly subplot
- Grid Search Cross Validation
- Altair interactive plot
- Get dummy values for categorical variables in Linear Regression
- Collaborating on Github :)
- https://www.kaggle.com/datasets/imakash3011/customer-personality-analysis (Original Dataset)
- https://ntulearn.ntu.edu.sg/webapps/blackboard/content/listContent.jsp?course_id=_2606895_1&content_id=_2762960_1
- https://stackoverflow.com/questions/21912634/how-can-i-sort-a-boxplot-in-pandas-by-the-median-values
- https://altair-viz.github.io/gallery/index.html
- https://www.kaggle.com/code/jnikhilsai/cross-validation-with-linear-regression/notebook
- https://towardsdatascience.com/gridsearchcv-for-beginners-db48a90114ee
- https://machinelearningmastery.com/rfe-feature-selection-in-python
- https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
- https://medium.com/lcc-unison/how-to-poisson-regression-model-python-implementation-1c672582eb96
- https://timeseriesreasoning.com/contents/poisson-regression-model
- https://matplotlib.org/stable/gallery/subplots_axes_and_figures/subplots_demo.html
- https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html
- https://medium.com/analytics-vidhya/implementing-linear-regression-using-sklearn-76264a3c073c
- https://machinelearningmastery.com/rfe-feature-selection-in-python/