GSM Arena Phone Scraper

This repository contains a Python-based web scraper for extracting detailed specifications of mobile phones from GSMArena. It utilizes Playwright and BeautifulSoup for robust data extraction and supports multi-threaded execution for efficient scraping.

Features

Progress Saving: Ensures data is not lost and scraping can resume from the last saved point in case of interruptions.
Concurrent Scraping: Uses ThreadPoolExecutor to scrape multiple pages concurrently.
Comprehensive Data Extraction: Extracts various phone specifications including model name, release date, OS details, CPU/GPU information, and more.
Custom Logging: Provides detailed logs of the scraping process for monitoring and debugging.

Requirements

Python 3.7+
Playwright
BeautifulSoup4
Requests
Logging
Pickle

Installation

Clone the repository:

git clone https://github.com/ahthserhsluk/GSMARENA-Mobile-Data-Scapper.git
cd gsmarena-phone-scraper

Install the required packages:
```
pip install -r requirements.txt
```
Install Playwright browsers:
```
playwright install
```

Usage

Update the main function in scraper.py with the desired manufacturer and start URL:

if __name__ == "__main__":
    manufacturer = "Nokia"  # Replace with the desired manufacturer
    start_url = "https://www.gsmarena.com/nokia-phones-1.php"
    end_page = 5  # Change this to set an end page or set to None to scrape all pages
    main(manufacturer, start_url, end_page)

Run the scraper:
```
python scraper.py
```
The scraped data will be saved to a CSV file in the manufacturer's directory.

Code Structure

scraper.py: The main script containing the scraping logic.
requirements.txt: The dependencies required to run the scraper.
logs/: Directory where logs are saved.
data/: Directory where the scraped CSV files are saved.

Contributing

Fork the repository.
Create your feature branch (git checkout -b feature/your-feature).
Commit your changes (git commit -m 'Add some feature').
Push to the branch (git push origin feature/your-feature).
Open a Pull Request.

License

This project is licensed under the MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

GSM Arena Phone Scraper

Features

Requirements

Installation

Usage

Code Structure

Contributing

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

GSM Arena Phone Scraper

Features

Requirements

Installation

Usage

Code Structure

Contributing

License