This project is a web scraper designed to extract product data from four different skincare e-commerce websites. The collected data includes Product Name, Product Description, Product Info, Product Variation Data (if any), Product Image URL, Product Benefits, and Instructions/Application info.
- Clone this repository to your local machine:
git clone https://github.com/yourusername/skincare-scraper.git
- Navigate to the project directory:
cd skincare-scraper
- Install the required dependencies:
pip install -r requirements.txt
Run the main script:
python scraper.py
The script will save the collected data at checkpoints to minimize the risk of data loss in case of interruptions.
The scraped data will be stored in a CSV file named skincare_data.csv
. Each row corresponds to a product, and columns include Product Name, Product Description, Product Info, Product Variation Data, Product Image URL, Product Benefits, and Instructions/Application info.
This project aims to scrape product data from four skincare e-commerce websites. The following tasks outline the step-by-step process for achieving this goal.
- create a virtual enviroment and install dependencies
- create a dev branch
- In a notebook, load the page locally and start to parse its elements
- Implement the scraping logic to collect Image URL.
- Implement the scraping logic to collect Product Name.
- Implement the scraping logic to collect Description.
- Implement the scraping logic to collect Product Info.
- Implement the scraping logic to collect Variation Data.
- Implement the scraping logic to collect Image URL.
- Implement the scraping logic to collect Benefits.
- Implement the scraping logic to collect Instructions.
- Implement the scraping logic to collect all products URLs.
- Store the collected data in a CSV format.
- Test your scraping script to ensure it works correctly and captures the necessary information.
- Implement checkpoints to periodically save the collected data.