Remove "files" from default skips? #27

rsignell-usgs · 2020-08-21T18:12:31Z

@kwilcox We just spent an hour trying to figure out why some datasets from our catalog were not getting picked up by the crawler and eventually we found the problem: the path looked like /models/model_a/run27/output_files/catalog.ncml and it was getting rejected by the default skips because it contains "files".

The default skips are:

[
  '.*files.*',
  '.*Individual Files.*',
  '.*File_Access.*',
  '.*Forecast Model Run.*',
  '.*Constant Forecast Offset.*',
  '.*Constant Forecast Date.*'
]

Could we remove the .*files.* line, or if we need it for some common use case, make it more specific, like .*files$?

https://github.com/ioos/thredds_crawler/blob/master/thredds_crawler/crawl.py#L55

The text was updated successfully, but these errors were encountered:

kwilcox · 2020-08-21T18:21:49Z

You can supply a custom list of skips, see https://github.com/ioos/thredds_crawler#skip

rsignell-usgs · 2020-08-21T18:29:55Z

Yes, that's how we solved it. I'm just wondering if maybe it would be nicer to make that a more specific skip if we can.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove "files" from default skips? #27

Remove "files" from default skips? #27

rsignell-usgs commented Aug 21, 2020 •

edited

Loading

kwilcox commented Aug 21, 2020

rsignell-usgs commented Aug 21, 2020

Remove "files" from default skips? #27

Remove "files" from default skips? #27

Comments

rsignell-usgs commented Aug 21, 2020 • edited Loading

kwilcox commented Aug 21, 2020

rsignell-usgs commented Aug 21, 2020

rsignell-usgs commented Aug 21, 2020 •

edited

Loading