From c1734fd5b26b55e18b75501158649d2ea6b36f9f Mon Sep 17 00:00:00 2001 From: pyoxa Date: Wed, 11 Oct 2023 22:18:02 +0200 Subject: [PATCH] readme tidied up --- README.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index efc7923..c6d710e 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # Concurrent web scraper with primitive cache ## How to run -*A prerequisite is to have Go 1.21+ installed* +*A prerequisite is to have Go 1.21+ installed* In root repo dir, run: ```bash go run . @@ -11,14 +11,13 @@ Or, using docker, run: docker build -t scraper . docker run -d -p 1337:1337 scraper ``` -Above commands will build and run the container, binding your host port 1337 to the same port on the container. +Above commands will build and run the container, binding your host's port 1337 to the same port on the container. -Sites sourced from `urls` will get scraped, with the retrieved words displayed on +Sites sourced from `urls` in `main.go` will get scraped, with the retrieved words displayed on `localhost:1337/metrics` ## Issues 1. As of writing this README, the scraper runs an unholy amount of Goroutines. -However, most of the goroutines are stagnant 2. The scraper does not follow redirects, only finds href elements, 3. It does not use ETag and If-None-Match headers, nor does it use If-Modified-Since (the cache isn't persistent and there is no way to invalidate it) \ No newline at end of file