YELP Merchant dashboard

Developing a dashboard for YELP merchants using insights generated from big data analytics, machine learning and deep learning techniques
The unique value of the product is to perform time series forecasting and anomaly detection on daily checkins of YELP businesses and to provide additional insights for anomalies using natural language processing techniques on user reviews
The YELP dataset was used for the goals of this project. Link to dataset: https://www.kaggle.com/yelp-dataset/yelp-dataset
To explore the data science techniques involved in the making of the project, take a look at the following notebooks:
note: datasets and models relevent for the dashboard are saved on a granular level to increase the dashboard refresh rate

Project achievements

Explored multiple forecasting techniques and tested them to find best performing forecasting technique - Prophet works amazingly with the time series data due to its light weight nature and automatic inference of seasonalities and trends in data - more information at 'forecasting' folder
Prophet confidence intervals were used to identify anomalies in the time series data

Instructions

downloading

Setup kaggle on local system and download the dataset using

kaggle datasets download -d yelp-dataset/yelp-dataset -p /data

Extract the files into 'data' folder
For documentation of the data, visit: https://www.yelp.com/dataset/documentation/main

dataset filter (optional)

Setup Spark on your local system
Run all cells in filter_dataset.ipynb notebook
Filtered datasets will be stored in filtered_data folder
note: filtered datasets are already available for convenience

get forecast models and anomalies

python3 forecast_components.py

Outputs:
- prophet models trained individually on multiple time series and saved here
- time series dataframe and detected anomalies are stored here

Project steps with logic

dataset filter: filter_dataset.ipynb

Filtered the datasets using PySpark to contain only businesses from Top 10 Canadian cities with most number business checkins
Dataset does not have a 'Country Code' to identify Canadian cities, so explored postal codes and identified codes of lengths 3,6,7 to be Canadian and filtered them
Number of records in the filtered dataset can be found in the notebook
Datasets are stored in a distributed format as parquet outputs
note: needs spark to be setup on the local system

time series forecasting: forecasting

Used daily aggregated time series data of number of checkins in all businesses within Toronto
Explored different time series forecasting techniques using train, validation and test sets appropriately
Evaluated model performances using MAPE (Mean Average Precision Error)

Inferences:
- Prophet works best with the data as it is light weight, needs very little tuning and captures the essence of the time series like trend and seasonality with high accuracy
- Linear, Dense and Convolutional Neural Network models work well and the performance can be improved by tuning the hyperparameters, but they are heavy compared to Prophet as they need significant training and will not scale easily for multiple time series
- RNN and LSTM models show poor performance as the number of training samples is significantly less - we can see its performance improvement on hourly aggregate version of data but that is out of the problem's scope

time series anomaly detection: anomaly_detection.ipynb

Predicted confidence interval values for time series using Prophet model was used to detect anomalies and detected anomalies were tagged with importance based on their deviation from the confidence intervals

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
docker_components		docker_components
filtered_data		filtered_data
forecasting		forecasting
images		images
notebooks		notebooks
saved_models/time_series		saved_models/time_series
README.md		README.md
filter_dataset.ipynb		filter_dataset.ipynb
forecast_components.py		forecast_components.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YELP Merchant dashboard

Project achievements

Instructions

downloading

dataset filter (optional)

get forecast models and anomalies

Project steps with logic

dataset filter: filter_dataset.ipynb

time series forecasting: forecasting

time series anomaly detection: anomaly_detection.ipynb

About

Releases

Packages

Languages

sachinnpraburaj/YELP-Merchant-Insights-Dashboard

Folders and files

Latest commit

History

Repository files navigation

YELP Merchant dashboard

Project achievements

Instructions

downloading

dataset filter (optional)

get forecast models and anomalies

Project steps with logic

dataset filter: filter_dataset.ipynb

time series forecasting: forecasting

time series anomaly detection: anomaly_detection.ipynb

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages