Skip to content

Ironhack's mini-project from module 2. Cleaning, exploring and visualizing a dataset about women's shoe prices in the US.

Notifications You must be signed in to change notification settings

HH2805/Womens_Shoes__A-Dataviz-Project

 
 

Repository files navigation

Project: Visualizing Real World Data - Women's Shoe Prices

Overview

The goal of this project was to practice visualization using real world data, under a tight time constraint.

I have chosen to study a 2019 dataset of around 19,000 women's shoes and their associated product information. The dataset is the public extract of one of the databases sold by Datafiniti, a database provider. The public extract was downloaded from Kaggle.

Each line of the dataset is a pair of shoes sold at a merchant website. It shows brand, description, price, merchant and more.

Although it is a big dataset, a lot of information appears to be missing or non-usable, which made for long data cleaning at the beginning of the project. We only kept 725 rows and 6 columns from an initial dataset of 19,045 rows and 47 columns.

I have focused on studying the potential links between price and brand, and between price and color.

Technical info

Necessary Deliverables

Insights

Please refer to the Tableau workbook.

Conclusion & Going Further

It would have been useful to have a more complete dataset to study. I suspect that brand and color may not be the most important variables explaining price of shoes. The project timeframe did not allow to un-nest the info in the 'Shoe Features' column. It would be interesting to do so. I suspect we could find explicative variables there, such as style (pump, peep-toes, sandal etc.), heel height, material (leather or synthetic) to explain the price variable. It would be interesting to use the statsmodel library to see which of these variables explain the most the price of a shoe. Finally, there is another dataset by Datafiniti relating to men's shoes. It would be interesting to study the differences between the 2 datasets to see if there is a difference in price between women and men.

About

Ironhack's mini-project from module 2. Cleaning, exploring and visualizing a dataset about women's shoe prices in the US.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%