Crawler configured by Datasources UI only crawls Startpage, although option "Crawl full domain..." #454

rafkamonday · 2022-11-11T11:42:34Z

Hello,

I try to crawl a webpage (full domain) but never will be crawled more than the startpage. In the Datasources UI I tried http and https, with www and without, with trailing slash and without. It never works. I would expect that the crawler will follow the links found in the startpage. I have no idea why it does not work as expected.

(The whole installation was made on bullseye with "one command" as documented in https://opensemanticsearch.org/doc/admin/install/search_server/ )

Tiberius1313 · 2022-11-27T02:30:27Z

on some pages it worked fine for me. but then I run into the same as you described with https://hudoc.echr.coe.int . to see if there are similarities in the structure it might be helpful to name your pages.

fractalvision · 2023-04-24T10:11:21Z

Signing under that, still persists.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crawler configured by Datasources UI only crawls Startpage, although option "Crawl full domain..." #454

Crawler configured by Datasources UI only crawls Startpage, although option "Crawl full domain..." #454

rafkamonday commented Nov 11, 2022

Tiberius1313 commented Nov 27, 2022

fractalvision commented Apr 24, 2023

Crawler configured by Datasources UI only crawls Startpage, although option "Crawl full domain..." #454

Crawler configured by Datasources UI only crawls Startpage, although option "Crawl full domain..." #454

Comments

rafkamonday commented Nov 11, 2022

Tiberius1313 commented Nov 27, 2022

fractalvision commented Apr 24, 2023