Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We should implement a thredds crawler ingestor command in geospaas.nansat_ingestor #40

Closed
mortenwh opened this issue Nov 7, 2018 · 6 comments
Assignees

Comments

@mortenwh
Copy link
Contributor

mortenwh commented Nov 7, 2018

No description provided.

mortenwh added a commit that referenced this issue Nov 7, 2018
mortenwh added a commit that referenced this issue Nov 7, 2018
…o the db - give as specific as possible urls (e.g., to the date of interest), otherwise it will take very long time
@mortenwh
Copy link
Contributor Author

mortenwh commented Nov 7, 2018

Tested on Sentinel-2 data from the Norwegian ground segment, e.g.:

./manage.py ingest_thredds_crawl http://nbstds.met.no/thredds/catalog/NBS/S2B/2018/07/catalog.html --date 2018/07/17

or

./manage.py ingest_thredds_crawl http://nbstds.met.no/thredds/catalog/NBS/S2B/2018/07/17/catalog.html --filename S2B_MSIL1C_20180717T095029_N0206_R079_T34VFM_20180717T115424.nc

The latter is much faster...

@mortenwh mortenwh self-assigned this Nov 10, 2018
mortenwh added a commit that referenced this issue Jan 1, 2019
… many model fields of the vocabularies app had too few characters, i.e., max_length was too low. Corrected the max_length numbers and added two automatically created migrations (one in viewer and one in catalog).
mortenwh added a commit that referenced this issue Feb 7, 2019
@akorosov
Copy link
Member

akorosov commented Feb 13, 2019

./manage.py ingest_thredds_crawl http://nbstds.met.no/thredds/catalog/NBS/S2B/2018/07/17/catalog.html --filename S2B_MSIL1C_20180717T095029_N0206_R079_T34VFM_20180717T115424.nc

This one didn't work for me. It find nothing:

(base) root@0294d0ad74aa:/src# project/manage.py ingest_thredds_crawl http://nbstds.met.no/thredds/catalog/NBS/S2B/2018/07/17/catalog.html --filename S2B_MSIL1C_20180717T095029_N0206_R079_T34VFM_20180717T115424.nc
2019-02-13 13:55:24,061 - [INFO] Crawling: http://nbstds.met.no/thredds/catalog/NBS/S2B/2018/07/17/catalog.html
2019-02-13 13:55:48,434 - [INFO] Ignoring dataset based on 'selects'.  ID: nbs/S2B/2018/07/17/S2B_MSIL1C_20180717T144749_N0206_R082_T35XML_20180717T195515.nc

..... many filenames come here .....

2019-02-13 13:55:48,483 - [INFO] Ignoring dataset based on 'selects'.  ID: nbs/S2B/2018/07/17/S2B_MSIL1C_20180717T095029_N0206_R079_T34VCH_20180717T133928.nc
HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 140212812998400:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1400 in H5F__open(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #002: H5Fint.c line 1548 in H5F_open(): unable to open file: name = 'http://nbstds.met.no/thredds/dodsC/NBS/S2B/2018/07/17/S2B_MSIL1C_20180717T095029_N0206_R079_T34VFM_20180717T115424.nc', tent_flags = 0
    major: File accessibilty
    minor: Unable to open file
  #003: H5FD.c line 734 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
Successfully added metadata of 0 datasets

@akorosov
Copy link
Member

The same with another option. After some file which look OK:

2019-02-13 14:10:43,343 - [INFO] Ignoring dataset based on 'selects'.  ID: nbs/S2B/2018/07/01/S2B_MSIL1C_20180701T093039_N0206_R136_T35VMH_20180701T113337.nc
2019-02-13 14:10:43,344 - [INFO] Ignoring dataset based on 'selects'.  ID: nbs/S2B/2018/07/01/S2B_MSIL1C_20180701T093039_N0206_R136_T35VMF_20180701T113337.nc
2019-02-13 14:10:43,344 - [INFO] Ignoring dataset based on 'selects'.  ID: nbs/S2B/2018/07/01/S2B_MSIL1C_20180701T093039_N0206_R136_T35VME_20180701T113337.nc
2019-02-13 14:10:43,344 - [INFO] Ignoring dataset based on 'selects'.  ID: nbs/S2B/2018/07/01/S2B_MSIL1C_20180701T093039_N0206_R136_T35VLE_20180701T113337.nc
2019-02-13 14:10:43,344 - [INFO] Ignoring dataset based on 'selects'.  ID: nbs/S2B/2018/07/01/S2B_MSIL1C_20180701T093039_N0206_R136_T35VLD_20180701T113337.nc

Come many errors:

HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 140197406467840:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1400 in H5F__open(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #002: H5Fint.c line 1548 in H5F_open(): unable to open file: name = 'http://nbstds.met.no/thredds/dodsC/NBS/S2B/2018/07/17/S2B_MSIL1C_20180717T144749_N0206_R082_T35XML_20180717T195515.nc', tent_flags = 0
    major: File accessibilty
    minor: Unable to open file
  #003: H5FD.c line 734 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 140197406467840:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1400 in H5F__open(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #002: H5Fint.c line 1548 in H5F_open(): unable to open file: name = 'http://nbstds.met.no/thredds/dodsC/NBS/S2B/2018/07/17/S2B_MSIL1C_20180717T144749_N0206_R082_T35XMK_20180717T195515.nc', tent_flags = 0
    major: File accessibilty
    minor: Unable to open file
  #003: H5FD.c line 734 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 140197406467840:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1400 in H5F__open(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #002: H5Fint.c line 1548 in H5F_open(): unable to open file: name = 'http://nbstds.met.no/thredds/dodsC/NBS/S2B/2018/07/17/S2B_MSIL1C_20180717T144749_N0206_R082_T33XXL_20180717T195515.nc', tent_flags = 0
    major: File accessibilty
    minor: Unable to open file
  #003: H5FD.c line 734 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 140197406467840:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1400 in H5F__open(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #002: H5Fint.c line 1548 in H5F_open(): unable to open file: name = 'http://nbstds.met.no/thredds/dodsC/NBS/S2B/2018/07/17/S2B_MSIL1C_20180717T144749_N0206_R082_T33XXK_20180717T195515.nc', tent_flags = 0
    major: File accessibilty
    minor: Unable to open file
  #003: H5FD.c line 734 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object

Maybe a timeout should be added before ds, cr = NansatDataset.objects.get_or_create(url) to not crash the opendap server ?

mortenwh added a commit that referenced this issue Feb 14, 2019
mortenwh added a commit that referenced this issue Feb 15, 2019
…ingest_thredds_crawl.py and some changes in the function
mortenwh added a commit that referenced this issue Feb 24, 2019
…ned html 400 error although netCDF4.Dataset works fine. If request.status!=200, I therefore try to open with netCDF4.Dataset. If that works, no error is raised..
mortenwh added a commit that referenced this issue May 10, 2019
mortenwh added a commit that referenced this issue May 10, 2019
mortenwh added a commit that referenced this issue May 10, 2019
mortenwh added a commit that referenced this issue Jul 25, 2019
mortenwh added a commit that referenced this issue Jul 25, 2019
mortenwh added a commit that referenced this issue Jul 25, 2019
mortenwh added a commit that referenced this issue Jul 25, 2019
mortenwh added a commit that referenced this issue Jul 25, 2019
mortenwh added a commit that referenced this issue Jul 25, 2019
mortenwh added a commit that referenced this issue Jul 25, 2019
mortenwh added a commit that referenced this issue Jul 25, 2019
…file contains the installed packages in a working version of the vm. Hopefully this solves the broken installation at travis...
mortenwh added a commit that referenced this issue Jul 25, 2019
mortenwh added a commit that referenced this issue Jul 25, 2019
mortenwh added a commit that referenced this issue Jul 25, 2019
…e. This file contains the installed packages in a working version of the vm. Hopefully this solves the broken installation at travis..."

This reverts commit 81c60fb.
mortenwh added a commit that referenced this issue Jul 25, 2019
mortenwh added a commit that referenced this issue Jul 25, 2019
…(and similar for DatasetURI.name) when a new dataset is ingested.
mortenwh added a commit that referenced this issue Jul 25, 2019
mortenwh added a commit that referenced this issue Jul 25, 2019
…crawl command, and changed a docstring in nansat_ingestor/managers.py
mortenwh added a commit that referenced this issue Jul 25, 2019
mortenwh added a commit that referenced this issue Jul 25, 2019
@ninsbl
Copy link

ninsbl commented Oct 1, 2021

Hei, I have been using thredds_crawler towards NBS too, but for me that regularly causes 502 or 504 errors. Did you notice that too? I tried to address thatin: ioos/thredds_crawler#29

But I am not sure if the library is actively maintained.... I saw that @akorosov contributed to it earlier. Do you know more or an alternative to thredds_crawler if that is no longer actively maintained (there is also anonther un-commented PR from may)? Maybe that is a dead horse?

@mortenwh
Copy link
Contributor Author

mortenwh commented Oct 26, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants