Catalog Sales Prediction

Executive Summary

To increase the return on investment (ROI) of mailed catalogs, we analyzed sales data from existing customers who received catalogs to identify high value targets for our marketing efforts in the future. Our analysis discovered the following key predictors for determining whether customers would be likely to respond to a catalog with a purchase: consistency of sales in consecutive years, recency of last purchases, the activeness of a customer, sales within the past year, and their average spend per order. The combination of our final logistic regression and multiple linear regression models resulted in a payoff of $51,465.45. In theory, a perfect model would result in a payoff of $120,252.40 from the test set. While not perfect, our model still has predicting power as seen through its predictions capturing 42.80% of the maximum payoff.

Business Situation

The file catalog sales data.csv comes from a retail company that sells upscale clothing on its website and via catalogs, which help drive customers to the website. All customers were sent a catalog mailing on Sep 1, 2012. On Dec 1, 2012 it was recorded whether or not they responded by making a purchase. There is one row for each customer. The targdol is the response variable, which is the purchase amount in response to receiving the catalog (targdol = 0 indicates that the customer did not respond). The remainder of variables are potential predictor variables which give information about the customer as of the time of the mailing. LTD means "life-to-date," i.e. since the customer purchased for the rst time.

Data Dictionary

There are a total 101,532 customers, who are randomly split into 50418 in the training set and the remaining 51,114 in the test set (train =1 training set, train =0 test set). The definitions of the variables are as follows.

targdol: dollar purchase resulting from catalog mailing
datead6: date added to file
datelp6: date of last purchase
lpuryear: latest purchase year
slstyr: sales ($) this year
slslyr: sales ($) last year
sls2ago: sales ($) 2 years ago
sls3ago: sales ($) 3 years ago
slshist: LTD dollars
ordtyr: orders this year
ordlyr: orders last year
ord2ago: orders 2 years ago
ord3ago: orders 3 years ago
ordhist: LTD orders
falord: LTD fall orders
sprord: LTD spring orders
train: training/test set indicator (1 = training, 0 = test)

Goal

Build a predictive model for targdol based on the training set and then test it on the test set.

Description of the `.Rmd` files

The .Rmd files should be exeuted in the following order:

data_cleaning.Rmd - Generates two data files called clean_train.csv and clean_test.csv containing the preprocessed training and testing data sets respectively.
logistic_modeling.Rmd - Fits multiple logistic regression models and chooses the best model. Generates the probability of responding for each customer in the file probabilities.csv.
linear_modeling.Rmd - Fits multiple linear regression models and chooses the best model. Combines the logistic and linear models to compute the payoff and MSPE

Criteria for Evaluating the Fitted Models:

General: The final fitted regression model should meet the usual criteria such as significant coefficients, satisfactory residual plots, good fit as measured by $R^2$ or $R^2$ adjusted, parsimony and interpretability of the model etc.
Financial Criterion: Select the top 1000 customers (prospects) from the test set who have the highest E(targdol). Then nd their total actual purchases. This is the payoff and should be as high as possible.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data_cleaning_cache/html		data_cleaning_cache/html
linear_modeling_files/figure-html		linear_modeling_files/figure-html
logistic_modeling_cache/html		logistic_modeling_cache/html
plots		plots
.gitignore		.gitignore
Predictive Analytics Project Report.pdf		Predictive Analytics Project Report.pdf
Project Problem Description.pdf		Project Problem Description.pdf
README.md		README.md
catalog sales data.csv		catalog sales data.csv
clean_test.csv		clean_test.csv
clean_train.csv		clean_train.csv
data_cleaning.Rmd		data_cleaning.Rmd
data_cleaning.html		data_cleaning.html
data_cleaning.md		data_cleaning.md
linear_modeling.Rmd		linear_modeling.Rmd
linear_modeling.html		linear_modeling.html
linear_modeling.md		linear_modeling.md
logistic_modeling.Rmd		logistic_modeling.Rmd
logistic_modeling.html		logistic_modeling.html
logistic_modeling.md		logistic_modeling.md
probabilities.csv		probabilities.csv
targdol.csv		targdol.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Catalog Sales Prediction

Executive Summary

Business Situation

Data Dictionary

Goal

Description of the `.Rmd` files

Criteria for Evaluating the Fitted Models:

About

Releases

Packages

Languages

rishabh-joshi/catalog-sales-prediction

Folders and files

Latest commit

History

Repository files navigation

Catalog Sales Prediction

Executive Summary

Business Situation

Data Dictionary

Goal

Description of the .Rmd files

Criteria for Evaluating the Fitted Models:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Description of the `.Rmd` files

Packages