Skip to content

The project helps Yelp business owners identify key insights regarding their user activities and recommends aspects that can be improved to enhance customer experience

Notifications You must be signed in to change notification settings

sachinnpraburaj/YELP-Merchant-Insights-Dashboard

Repository files navigation

YELP Merchant dashboard

Project achievements

  • Explored multiple forecasting techniques and tested them to find best performing forecasting technique - Prophet works amazingly with the time series data due to its light weight nature and automatic inference of seasonalities and trends in data - more information at 'forecasting' folder
  • Prophet confidence intervals were used to identify anomalies in the time series data

Instructions

downloading

  • Setup kaggle on local system and download the dataset using
kaggle datasets download -d yelp-dataset/yelp-dataset -p /data

dataset filter (optional)

  • Setup Spark on your local system
  • Run all cells in filter_dataset.ipynb notebook
  • Filtered datasets will be stored in filtered_data folder
  • note: filtered datasets are already available for convenience

get forecast models and anomalies

python3 forecast_components.py
  • Outputs:
    • prophet models trained individually on multiple time series and saved here
    • time series dataframe and detected anomalies are stored here

Project steps with logic

dataset filter: filter_dataset.ipynb

  • Filtered the datasets using PySpark to contain only businesses from Top 10 Canadian cities with most number business checkins
  • Dataset does not have a 'Country Code' to identify Canadian cities, so explored postal codes and identified codes of lengths 3,6,7 to be Canadian and filtered them
  • Number of records in the filtered dataset can be found in the notebook
  • Datasets are stored in a distributed format as parquet outputs
  • note: needs spark to be setup on the local system

time series forecasting: forecasting

  • Used daily aggregated time series data of number of checkins in all businesses within Toronto
  • Explored different time series forecasting techniques using train, validation and test sets appropriately
  • Evaluated model performances using MAPE (Mean Average Precision Error)

MAPE formula

  • Inferences:
    • Prophet works best with the data as it is light weight, needs very little tuning and captures the essence of the time series like trend and seasonality with high accuracy
    • Linear, Dense and Convolutional Neural Network models work well and the performance can be improved by tuning the hyperparameters, but they are heavy compared to Prophet as they need significant training and will not scale easily for multiple time series
    • RNN and LSTM models show poor performance as the number of training samples is significantly less - we can see its performance improvement on hourly aggregate version of data but that is out of the problem's scope

time series anomaly detection: anomaly_detection.ipynb

  • Predicted confidence interval values for time series using Prophet model was used to detect anomalies and detected anomalies were tagged with importance based on their deviation from the confidence intervals

anomaly detection

About

The project helps Yelp business owners identify key insights regarding their user activities and recommends aspects that can be improved to enhance customer experience

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages