Web_Crawler

This repo contains crawlers I used in my projects or research. This repo will be updated with new methods and techniques loaded in scripts. Most of the work will be done using Scrapy. I haven't started using beautifulsoup up till now but will update the code if I required that library.

This repo currently contains 2 spiders:

cb_spider.py: This spider crawls on Amazon.com-smartphone to fetch product description, price(in $) and ratings(out of 5). The output is stored in an JSON file. This is the output.
corona.py: This spider crawls on Poynter-IFCN Covid-19 Misinformation to fetch fake data. The output is stored in an CSV file. This is the output
snope.py: This spider crawls on Snope's fact checker page and retrieves news and related contents(question, comments, claim and origin)along with their labels whether it's true, false or unknown.
samayam.py: This spider crawls on Samayam's fact check news, to fetch news data in tamil language. -tamil.py/tamil18.py: These spiders is similar to samayam.py and scrapes data from Tamil OneIndia and Tamil News18 respectively.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web_Crawler

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
amazon_cb_output.json		amazon_cb_output.json
cb_spider.py		cb_spider.py
corona.py		corona.py
poynter_mis_info_covid.csv		poynter_mis_info_covid.csv
samayam.py		samayam.py
snope.py		snope.py
tamil.py		tamil.py
tamil18.py		tamil18.py

AmbiTyga/Web_Crawler

Folders and files

Latest commit

History

Repository files navigation

Web_Crawler

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages