Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawler configured by Datasources UI only crawls Startpage, although option "Crawl full domain..." #454

Open
rafkamonday opened this issue Nov 11, 2022 · 2 comments

Comments

@rafkamonday
Copy link

Hello,

I try to crawl a webpage (full domain) but never will be crawled more than the startpage. In the Datasources UI I tried http and https, with www and without, with trailing slash and without. It never works. I would expect that the crawler will follow the links found in the startpage. I have no idea why it does not work as expected.

(The whole installation was made on bullseye with "one command" as documented in https://opensemanticsearch.org/doc/admin/install/search_server/ )

@Tiberius1313
Copy link

on some pages it worked fine for me. but then I run into the same as you described with https://hudoc.echr.coe.int . to see if there are similarities in the structure it might be helpful to name your pages.

@fractalvision
Copy link

Signing under that, still persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants