Repository: challenge-data-analysis
Type of Challenge: Consolidation
Duration: 3 days
Deadline: 22/11/2024 12:30
Team Size: 3
We are continuing our journey to create a machine learning model that predicts property prices in Belgium. In this phase, we focus on conducting a preliminary data analysis to gather meaningful insights.
We will use the previously scraped dataset, clean the data, and perform the analysis.
immo-eliza-analysis/
├── assets/ # Folder containing files for graphs visualizations
├── Data/ # Folder containing all datasets
├── map/ # Folder containing files for map visualizations
│ ├── immoweb_data.csv # Raw dataset
│ ├── immoweb_data_cleaned.csv # Cleaned dataset
│ ├── immoweb_data_filtred.csv # Filtered dataset
│ ├── encoded_data.csv # Encoded dataset
├── dashboard.py # Build and run dashboard
├── Data_Analysis.py # Run analysis on the cleaned data
├── Data_Cleaning.py # Run cleaning on raw immoweb_data.csv
├── README.md # Project overview and instructions
├── real_estate_data_analysis.ipynb # Jupyter notebook for data analysis
├── real_estate_data_cleaning.ipynb # Jupyter notebook for data cleaning
To get started, clone the repository:
git clone https://github.com/ManelBouba/immo_eliza_analysis
To run the project, follow the steps below:
-
Data Cleaning:
RunData_Cleaning.py
to clean the raw data (immoweb_data.csv
) and save the cleaned dataset asimmoweb_data_cleaned.csv
. -
Data Analysis:
After cleaning the data, runData_Analysis.py
to analyze the cleaned data and generate insights. -
Dashboard:
Usedashboard.py
to create a visualization dashboard by running:streamlit run dashboard.py
This will start a local server and allow you to interact with the data via an interactive dashboard.
The detailed explanation of the Data Cleaning process is in the Jupyter notebook real_estate_data_cleaning.ipynb, contributed by LAI Edoardo.
- Loaded and explored the dataset.
- Identified and visualized missing values.
- Identified critical columns to keep and drop, with explanations.
- Handled missing values by imputing data based on grouped analysis.
- Categorized variables to make them easier for analysis.
- Removed outliers and visualized their impact through histograms and boxplots.
- Prepared data for future modeling tasks.
The detailed explanation of Data Analysis is in the Jupyter notebook real_estate_data_analysis.ipynb, contributed by BOUBAKEUR Manel.
-
Dataset Overview: The dataset has 16 columns and 16,631 rows.
-
Correlation Analysis:
- The strongest correlations with property price are:
Living Area (0.43)
Number of Rooms (0.33)
- The weakest correlations with price are:
Lift (0.02)
Type of Property (0.02)
Swimming Pool (0.03)
Garden (0.04)
- The strongest correlations with property price are:
-
Qualitative and Quantitative Variables:
- We analyzed both types of variables and discussed the transformation of qualitative variables into numerical values for analysis.
In this step, the analysis was communicated through tables and graphs by BOUBAKEUR Manel and FOMICHOV Andrii.
-
Strong Positive Correlations:
Type_of_Property ↔ Subtype_of_Property (0.71)
Living_Area ↔ Number_of_Rooms (0.72)
Surface_area_plot_of_land ↔ Type_of_Property (0.73)
-
Moderate Positive Correlations:
Number_of_Rooms ↔ Type_of_Property (0.57)
Living_Area ↔ Type_of_Property (0.60)
-
Weak Positive Correlations:
Fully_Equipped_Kitchen ↔ State_of_the_Building (0.25)
Terrace ↔ Fully_Equipped_Kitchen (0.15)
-
Price vs Property Analysis:
- The price distribution and mean values were plotted, identifying the most common price bins and the distribution across different types of properties.
The price distribution shows a significant number of observations in the price range of 300,000 to 400,000 euros.
-
Price by Municipality:
- The analysis of municipalities helped identify the most and least expensive areas in Belgium, Wallonia, and Flanders.
Most Expensive Municipality (Belgium):
- Knokke-Heist: Average Price = 601,451.55 EUR | Price per sqm = 7,464.10 EUR
Least Expensive Municipality (Belgium):
- Vaux-sur-Sûre: Average Price = 125,000.00 EUR | Price per sqm = 657.89 EUR
The full breakdown for Wallonia and Flanders is provided in the project.
Here are the visualizations related to the project:
- BOUBAKEUR Manel – Data Analyst & Developer
- LAI Edoardo – Data Cleaning & Preprocessing
- FOMICHOV Andrii – Data Visualization & Interpretation