- the Programming Historian has a beginner tutorial and an advanced tutorial
- an extraordinarily useful piece of software
- follow installation instructions from the beginner tutorial for your OS
wget http://activehistory.ca/papers
- study your target carefully for folder structures: understand its conventions!
- compare with here
- How would you go about grabbing data from that source?
- if you do the simple use case:
Saving to: `index.html.1'
[] 37,668 --.-K/s in 0.1s
2012-05-15 15:50:26 (374 KB/s) - `index.html.1' saved [37668]
wget [options] [url]
-r
- default depth is 5 links (think of that as 'five degrees of separation')
--no-parent
- won't leave the
http://activehistory.ca/papers/
- double dash = a full-text command; you could abbreviate with
-np
-l 2
- ie, 'links' 2, or two webpages beyond.
-w
--limit-rate=20k
- or wait a moment before pinging with commands; 2 seconds is fair
- and don't use up their bandwidth