Skip to content

A series of product data scrapers with wrappers specific to different ecommerce sites

License

Notifications You must be signed in to change notification settings

CaptainValor/web-scrapers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This repository contains a series of web scrapers / wrappers that use Nokogiri to parse product information from a number of different styles of ecommerce websites. The scrapers then output the product information as CSV data.

Background

Dependencies

Technical Notes

These scrapers have been anonymized for privacy reasons. They are not immediately usable on any web site without modification. As-is they demonstrate various methods for parsing different site structures that you might encounter, but you must adapt the wrapper (the components of the scraper that identify and parse specific HTML elements) to suit the particular site you are analyzing, and update the wrapper if the site changes.

For more on the subject, [see this article](http://en.wikipedia.org/wiki/Wrapper_(data_mining\)).

The CSV output is VERY rudimentary, and in the case of Scraper D sometimes results in duplication of data. I might improve this in future projects, but the scrapers are bare-bones functional as they stand.

Contact

If you have questions or comments regarding this project, hit me up on here or shoot me an email:

Stephen Torrence [email protected]

About

A series of product data scrapers with wrappers specific to different ecommerce sites

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages