Skip to content

Latest commit

 

History

History
72 lines (33 loc) · 1.43 KB

md2-04.md

File metadata and controls

72 lines (33 loc) · 1.43 KB

Wget it.

  • the Programming Historian has a beginner tutorial and an advanced tutorial
  • an extraordinarily useful piece of software
  • follow installation instructions from the beginner tutorial for your OS

A simple use case:

wget http://activehistory.ca/papers

  • study your target carefully for folder structures: understand its conventions!

Is wget always the best option?

  • compare with here
  • How would you go about grabbing data from that source?

what you see:

  • if you do the simple use case:

Saving to: `index.html.1'

[] 37,668 --.-K/s in 0.1s

2012-05-15 15:50:26 (374 KB/s) - `index.html.1' saved [37668]

options

wget [options] [url]

-r

  • default depth is 5 links (think of that as 'five degrees of separation')

more options

--no-parent

  • won't leave the http://activehistory.ca/papers/
  • double dash = a full-text command; you could abbreviate with -np

understanding parents

image

leaving the nest

-l 2

  • ie, 'links' 2, or two webpages beyond.

but don't be mean

-w

--limit-rate=20k

  • or wait a moment before pinging with commands; 2 seconds is fair
  • and don't use up their bandwidth