Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thredds_crawler revial #35

Open
6 tasks
ocefpaf opened this issue Jan 31, 2025 · 9 comments
Open
6 tasks

thredds_crawler revial #35

ocefpaf opened this issue Jan 31, 2025 · 9 comments

Comments

@ocefpaf
Copy link
Member

ocefpaf commented Jan 31, 2025

To do list (kind of in order of priority):

  • fix the failing tests.
  • write proper docs, based on the examples in the README and API docs.
  • enable pre-commits and fix remaining lints.
  • triage on old issues and PRs.
  • mint new a release
  • add a note about how this compare/complements/could use siphon
@ocefpaf
Copy link
Member Author

ocefpaf commented Jan 31, 2025

I decided to take a crack at the last one. See https://nbviewer.org/gist/ocefpaf/374ce5c8131343e789e5261d1fe7d83c

I'm not an expert on either packages (siphon or thredds_crawler), but it seems that one can do everything described in the thredds_crawler's README (the only docs we have), with siphon. Siphon is also faster and looks in a better shape.

@muhd360
Copy link

muhd360 commented Feb 3, 2025

@ocefpaf if u can assign a few tasks ,would like to take a crack at them- any specific prs that may not be a ton of work??

@ocefpaf
Copy link
Member Author

ocefpaf commented Feb 3, 2025

@muhd360 if you are interested in GSoC'25 I recommend you to wait until the orgs are selected. At this point we don't know if we'll be selected and we don't have a list of projects ideas.

If you are interested in thredds_crawler, feel free to tackle any issue you know you can solve. We probably won't have many cycles to review big PRs, so keep them small with a limited context.

@muhd360
Copy link

muhd360 commented Feb 3, 2025

@ocefpaf no i was just asking generally if there are any prs u could point me 2 would be glad 2 help
`

@ocefpaf
Copy link
Member Author

ocefpaf commented Feb 3, 2025

Try to familiarize yourself with the codebase and see what interests you.

@kthyng
Copy link

kthyng commented Feb 3, 2025

@ocefpaf I wanted to use siphon in place of this but as far as I can tell, siphon won't walk through nested thredds servers. Do you know otherwise?

@ocefpaf
Copy link
Member Author

ocefpaf commented Feb 3, 2025

I do not, but we can ask upstream. If it is something we can workaround and/or upstream to siphon, what other features do you believe thredds_crawler has that is missing in siphon?

@kthyng
Copy link

kthyng commented Feb 3, 2025

I opened an issue in siphon to ask about this but haven't been back to this effort since then to follow up: Unidata/siphon#263

Doing some regex matching to include or exclude links I think is in thredds_crawler but not siphon and is handy.

@ocefpaf
Copy link
Member Author

ocefpaf commented Feb 4, 2025

I'm an advocating we re-archive thredds_crawler, but siphon's performance made me think we should, at least, be using it under the hood as the crawler engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants