Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argument '-i -1' does not work. #185

Open
MobinRanjbar opened this issue Jun 19, 2020 · 4 comments
Open

Argument '-i -1' does not work. #185

MobinRanjbar opened this issue Jun 19, 2020 · 4 comments

Comments

@MobinRanjbar
Copy link

MobinRanjbar commented Jun 19, 2020

Hi there,

I wanted to crawl whole content of a website. When I run the command below, crawling process does not start. What is wrong?

bin/sparkler.sh crawl -id 1 -i -1

Output:
2020-06-19 12:38:06 INFO Crawler$:153 - Committing crawldb..
2020-06-19 12:38:06 INFO Crawler$:221 - Shutting down Spark CTX..

@thammegowda
Copy link
Member

Sparkler does nothing when no URLs are there to crawl. And your output looks like there are no new URLs to be crawled.
try injecting some new URLs and try again.

@MobinRanjbar
Copy link
Author

Hi,

I have injected a new URL before that like below. The same thing happens.

bin/sparkler.sh inject -id 1 -su 'https://www.nasa.gov/'

@thammegowda
Copy link
Member

I am guessing there is an error in your setup.
Did you try it from docker image https://hub.docker.com/r/uscdatascience/sparkler/tags ; could you please try?

CC @buggtb do you have any guesses on why/when/how this case might happen?

@MobinRanjbar
Copy link
Author

MobinRanjbar commented Jun 23, 2020

Hi,

The same thing happened in docker!! :

sparkler@292e25536b51:/data/sparkler$ bin/sparkler.sh inject -id 1 -su 'https://www.nasa.gov/'
2020-06-23 07:46:16 INFO Injector$:97 - Injecting 1 seeds
jobId = 1
sparkler@292e25536b51:/data/sparkler$ bin/sparkler.sh crawl -id 1 -tn 100 -i -1
2020-06-23 07:46:35 WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-06-23 07:46:40 INFO Crawler$:153 - Committing crawldb..
2020-06-23 07:46:40 INFO Crawler$:221 - Shutting down Spark CTX..
sparkler@292e25536b51:/data/sparkler$

Have you ever tried that argument?

@chrismattmann chrismattmann added this to the Sparkler 1.0 Release milestone Oct 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants