Skip to content

Commit

Permalink
readme tidied up
Browse files Browse the repository at this point in the history
  • Loading branch information
pyoxa committed Oct 11, 2023
1 parent ee61320 commit c1734fd
Showing 1 changed file with 3 additions and 4 deletions.
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Concurrent web scraper with primitive cache

## How to run
*A prerequisite is to have Go 1.21+ installed*
*A prerequisite is to have Go 1.21+ installed*
In root repo dir, run:
```bash
go run .
Expand All @@ -11,14 +11,13 @@ Or, using docker, run:
docker build -t scraper .
docker run -d -p 1337:1337 scraper
```
Above commands will build and run the container, binding your host port 1337 to the same port on the container.
Above commands will build and run the container, binding your host's port 1337 to the same port on the container.

Sites sourced from `urls` will get scraped, with the retrieved words displayed on
Sites sourced from `urls` in `main.go` will get scraped, with the retrieved words displayed on
`localhost:1337/metrics`

## Issues
1. As of writing this README, the scraper runs an unholy amount of Goroutines.
However, most of the goroutines are stagnant
2. The scraper does not follow redirects, only finds href elements,
3. It does not use ETag and If-None-Match headers, nor does it use If-Modified-Since
(the cache isn't persistent and there is no way to invalidate it)

0 comments on commit c1734fd

Please sign in to comment.