Table of Contents
In this mini-project, I use the CRISP-DM process to answer several business questions about Airbnb locations and reservations across Spain using their publicly-available data. Get to know the main insights by reading my post on Medium.
We will take the role of a private investor that has decided to purchase a property in Spain for renting it out through Airbnb. After careful examination, we have selected 9 possible Spanish cities where it would be interesting to make such a purchase. Naturally, we want to maximize our return on investment (ROI), for which we need to understand the competition in each city as well as the main price drivers for each location.
After having a brief look at the available data, we have selected a few questions that will aid us in making our investment decisions:
- What is the average price of each location type per neighbourhood? What are the most expensive neighbourhoods on average?
- What is the average host acceptance rate per location type and neighborhood? In which neighbourhoods is it the lowest?
- How is the competition in each neighbourhood? What number and proportion of listings belong to hosts owning different numbers of locations? In which neighbourhoods is the concentration lower?
- What is the expected average profit per room type and neighborhood when looking at the reservations for the next 6 months? What is the neighbourhood expected to be the most profitable in that period?
- What listings' factors affect the expected profit for the next 6 months? Can we use them to forecast the expected profit over that period?
We will be comparing the answers to those questions among the different Spanish regions of Madrid, Barcelona, Girona, Valencia, Mallorca, Menorca, Sevilla, Málaga and Euskadi. Hopefully, this will help us in making a more informed investment decision.
In order answer our questions, we will follow the CRISP-DM process. Our list of questions is already the result of the first two steps (Business Understanding and Data Understanding). We will then prepare the data as necessary to obtain the answers to our questions. This part will include performing all sorts of pre-processing steps, such as data cleaning as well as dealing with missing values. For our final question, we will also be modelling the data and try to predict the number of reservations for each location.
All processing is done with the help of Python and its widely-used libraries such as pandas
, numpy
and scikit-learn
.
This project uses publicly-available Airbnb data for 9 Spanish regions (the September 2022 version of each region). For each region, we have two different datasets:
- Listings: Contains all kinds of information regarding Airbnb listings, such as location, host it belongs to, type, etc. The complete data dictionary can be found in
data/airbnb/listings_schema.csv
. - Calendar: Contains reservations for all listings and the price at which they were reserved.
To make use of this project, I recommend managing the required dependencies with Anaconda.
Install miniconda:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
Install mamba:
conda install -n base -c conda-forge mamba
Install environment using provided file:
mamba env create -f environment.yml # alternatively use environment_hist.yml if base system is not debian
mamba activate airbnb_spain
And finally, follow along the main notebook: notebooks/main.ipynb
.
The project files are structured as follows:
data/airbnb
: Where all data is located.notebooks/main.ipynb
: The Jupyter notebook that runs the complete project.src
: Contains the source code of helper functions used in the data wrangling and analysis.
Source files formatted using the following commands:
isort .
autoflake -r --in-place --remove-unused-variable --remove-all-unused-imports --ignore-init-module-imports .
black .
Distributed under the MIT License. See LICENSE
for more information.
GitHub - Google Scholar - LinkedIn - Twitter
This project was done as part of the Data Science Nanodegree Program at Udacity.