This repo contains crawlers I used in my projects or research. This repo will be updated with new methods and techniques loaded in scripts. Most of the work will be done using Scrapy. I haven't started using beautifulsoup up till now but will update the code if I required that library.
This repo currently contains 2 spiders:
- cb_spider.py: This spider crawls on Amazon.com-smartphone to fetch product description, price(in $) and ratings(out of 5). The output is stored in an JSON file. This is the output.
- corona.py: This spider crawls on Poynter-IFCN Covid-19 Misinformation to fetch fake data. The output is stored in an CSV file. This is the output
- snope.py: This spider crawls on Snope's fact checker page and retrieves news and related contents(question, comments, claim and origin)along with their labels whether it's true, false or unknown.
- samayam.py: This spider crawls on Samayam's fact check news, to fetch news data in tamil language. -tamil.py/tamil18.py: These spiders is similar to samayam.py and scrapes data from Tamil OneIndia and Tamil News18 respectively.