Skip to content

OscarPrediction1/wikiCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wikiCrawler

Installation

npm install

Setup MYSQL-DB

  • Run sql/wiki_films.sql and sql/wiki_views.sql
  • Insert your wiki_films into wiki_films database (pageid is not required)

Config

  • Copy app/config.default.js to app/config.js
  • Enter your MYSQL credentials

Run

Crawl all films

node app.js all

Crawl multiple films (can be used with year and month)

node app.js films --boxOfficeIds [boxOfficeId1],[boxOfficeId2],[...],[boxOfficeIdN]

Crawl only specific film for all years

node app.js film --boxOfficeId [boxOfficeId]

Crawl only a specific year for a specific film

node app.js film --boxOfficeId [boxOfficeId] --year [year]

Crawl only a specific month for a specific film

node app.js film --boxOfficeId [boxOfficeId] --year [year] --month [month]

Crawl only a specific url for a specific film

node app.js film --boxOfficeId [boxOfficeId] --url [url]

Export

Check if your MYSQL user has the global right to access files.

Export csv (can be used for bigquery)

node export.js bigquery

Export csv with custom where

node export.js bigquery --where "[Where-Clause]"

Cloud9

https://ide.c9.io/rechenberger/oscar-wiki-crawler

Procedure to crawl new nominations

  • Add boxOfficeId, pageid, title to wiki_films table
  • Run the node app.js films... script with new boxOfficeIds for the month between nomination and awards
  • Export via node export.js bigquery --where... script where "where" is the duration between nomination and awards

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published