Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[FEAT] Create obstore store in fsspec on demand (#198)
* feat: split bucket from path + construct store constructe store with from_url using protocol and bucket name * feat: remove store + add protocol + apply to all methods * feat: inherit from AsyncFsspecStore to specify protocol Specify protocol s3, gs, and abfs * fix: correctly split protocol if exists in path * feat: use urlparse to extract protocol * update typing * fix: unbounded error * fix: remove redundant import * feat: add register() to register AsyncFsspecStore for provided protocol * feat: add validation for protocol in register() * test: for register() Check if AsyncFsspecStore is registered and test invalid types pass into register * feat: add async parameter for register() * test: test async store created by register() * feat: add http(s) into protocol_with_bucket list bucket for https is the netloc of the url (e.g. https://www.google.com/path, www.google.com is the bucket here) * feat: ls return path with bucket name To solve error when _walk is called recurrsively with the previous result by ls * feat: enable re-register same protocol * test: update pytest fixture to use register() * test: update test with new path format path with bucket name * fix: mkdocs build error * fix: error when merging * build: add some ruff ignore * fix: ruff error * build: add cachetools dependencies * better scoping of lints * lint * fix: update lru_cache + clean class attribute * fix some bugs when using get/put/cp/info/ls * fix: declare lru_cache in __init__ * fix: make AsyncFsspecStore cachable * test: for cache constructed store and filesystem obj * build: remove dependencies * fix: prevent send folder path to cat_file * fix: enable cp folders * lint * fix: clobber=False to prevent re-register and cause memory leak If register multiple time, and each of them have their instance, the cache does not work and will end up with multiple instances with same config * test: clean up after each test to prevent memory leak * Simplify protocol registration * fix+test: register check types * small edits * fix+test: update conftest * style: format * feat: enable setting protocol in __init__ * docs: update example in docstring * fix: better split path way * fix: split path for protocol with no bucket properly * test: for split path * refactor: take out _fill_bucket_name function * refactor: take out runtime type check for register * test: remove test register invalid type As the runtime type check is removed in register for simplification, this test is no longer needed * Switch to checking if protocol does not require bucket * Warn on unknown protocol --------- Co-authored-by: Kyle Barron <[email protected]>
- Loading branch information