Skip to content

Commit

Permalink
[FEAT] Create obstore store in fsspec on demand (#198)
Browse files Browse the repository at this point in the history
* feat: split bucket from path + construct store

constructe store with from_url using protocol and bucket name

* feat: remove store + add protocol + apply to all methods

* feat: inherit from AsyncFsspecStore to specify protocol

Specify protocol s3, gs, and abfs

* fix: correctly split protocol if exists in path

* feat: use urlparse to extract protocol

* update typing

* fix: unbounded error

* fix: remove redundant import

* feat: add register() to register AsyncFsspecStore for provided protocol

* feat: add validation for protocol in register()

* test: for register()

Check if AsyncFsspecStore is registered and test invalid types pass into
register

* feat: add async parameter for register()

* test: test async store created by register()

* feat: add http(s) into protocol_with_bucket list

bucket for https is the netloc of the url (e.g.
https://www.google.com/path, www.google.com is the bucket here)

* feat: ls return path with bucket name

To solve error when _walk is called recurrsively with the previous
result by ls

* feat: enable re-register same protocol

* test: update pytest fixture to use register()

* test: update test with new path format

path with bucket name

* fix: mkdocs build error

* fix: error when merging

* build: add some ruff ignore

* fix: ruff error

* build: add cachetools dependencies

* better scoping of lints

* lint

* fix: update lru_cache + clean class attribute

* fix some bugs when using get/put/cp/info/ls

* fix: declare lru_cache in __init__

* fix: make AsyncFsspecStore cachable

* test: for cache constructed store and filesystem obj

* build: remove dependencies

* fix: prevent send folder path to cat_file

* fix: enable cp folders

* lint

* fix: clobber=False to prevent re-register and cause memory leak

If register multiple time, and each of them have their instance, the
cache does not work and will end up with multiple instances with same
config

* test: clean up after each test to prevent memory leak

* Simplify protocol registration

* fix+test: register check types

* small edits

* fix+test: update conftest

* style: format

* feat: enable setting protocol in __init__

* docs: update example in docstring

* fix: better split path way

* fix: split path for  protocol with no bucket properly

* test: for split path

* refactor: take out _fill_bucket_name function

* refactor: take out runtime type check for register

* test: remove test register invalid type

As the runtime type check is removed in register for simplification,
this test is no longer needed

* Switch to checking if protocol does not require bucket

* Warn on unknown protocol

---------

Co-authored-by: Kyle Barron <[email protected]>
  • Loading branch information
machichima and kylebarron authored Feb 28, 2025
1 parent e91336a commit 34f6d30
Show file tree
Hide file tree
Showing 3 changed files with 571 additions and 89 deletions.
Loading

0 comments on commit 34f6d30

Please sign in to comment.