Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend DandiInstance to support APIs outside of dandiarchive.org subdomains #1517

Open
aaronkanzer opened this issue Oct 30, 2024 · 11 comments

Comments

@aaronkanzer
Copy link
Member

aaronkanzer commented Oct 30, 2024

@yarikoptic @jwodder -- wanted to start a GitHub Issue to get consolidation of linc-cli https://github.com/lincbrain/linc-cli back to dandi-cli -- working with @satra @kabilar here to merge LINC work back into DANDI as Ember approaches

Prior to coding a potential solution, wanted to get your thoughts. The only 2 differences that exist between dandi-cli and linc-cli are:

client.dandi_authenticate() is called upon every command in linc-cli due to the locked-down nature of LINC assets for now
• LINC has an API Key called LINCBRAIN_API_KEY

There are no other differences, other than the API that is hit of course (api.lincbrain.org/.....)

My proposal here would be to:

  1. Expand the DandiInstance object offerings to include LINC in consts.py -- code reference
  2. Add a is_private_archive boolean to the DandiInstance -- code reference. This bool value would be used to decide upon what commands in dandi-cli would require client.dandi_authenticate()
  3. Include an --api-key flag that could be passed in the CLI commands to handle the API key for any extending instance (would go against the keyring pattern a bit, so curious to get your thoughts especially here)

Let me know -- I'll default to some of your wisdom here as you've authored most of the CLI tool

Cc @waxlamp -- tagging for visibility for Kitware folks

@aaronkanzer aaronkanzer changed the title Abstract DandiInstance to support APIs not in dandiarchive.org subdomains Extend DandiInstance to support APIs outside of dandiarchive.org subdomains Oct 30, 2024
@yarikoptic
Copy link
Member

Additional aspect on user level visible interface/configuration:

  • do we want to see linc* instances as just sample instances of dandi and otherwise still use dandi CLI and determine which instance to operate based on dandiset metadata at hands or --instance option provided . If it stays so DANDI_API_KEY could still be used for LINC key specification IMHO.

or

  • still have a dedicated linc CLI entry point installed along with dandi which would automagically favor linc and LINC_API_KEY to be the env var?

@satra
Copy link
Member

satra commented Oct 30, 2024

do we want to see linc* instances as just sample instances of dandi and otherwise still use dandi CLI

generally speaking yes. although one could release a linc cli that effectively is a very lightweight wrapper around dandi cli, changing some configs. i think dandi has become quite synonymous with the neurophysiology repository, and it may take some work to turn it into a more general framework. however, i would be keen to do so, if possible.

technically as you raise, naming things like api keys and instances would require some thought and cli awareness.

@aaronkanzer
Copy link
Member Author

do we want to see linc* instances as just sample instances of dandi and otherwise still use dandi CLI

generally speaking yes. although one could release a linc cli that effectively is a very lightweight wrapper around dandi cli, changing some configs. i think dandi has become quite synonymous with the neurophysiology repository, and it may take some work to turn it into a more general framework. however, i would be keen to do so, if possible.

technically as you raise, naming things like api keys and instances would require some thought and cli awareness.

@satra Could you provide some base case abstractions you'd like to accomplish here?

@aaronkanzer
Copy link
Member Author

aaronkanzer commented Oct 30, 2024

Additional aspect on user level visible interface/configuration:

  • do we want to see linc* instances as just sample instances of dandi and otherwise still use dandi CLI and determine which instance to operate based on dandiset metadata at hands or --instance option provided . If it stays so DANDI_API_KEY could still be used for LINC key specification IMHO.

or

  • still have a dedicated linc CLI entry point installed along with dandi which would automagically favor linc and LINC_API_KEY to be the env var?

@yarikoptic I think the idea of inferring the API via reading from the dandiset metadata serves as the lowest cognitive overhead for the end user -- as far as routing the dandi CLI command

@satra @yarikoptic my hesitancy of although one could release a linc cli that effectively is a very lightweight wrapper is that this will just lead to a handful of forks still. DANDI CLI (and LINC) are strongly coupled to the data models and analagous prescribed validations, thus, from an end user perspective, at a certain point, I'd be inclined to rather just use a cloud-vendored issued CLI tool for file management instead of dandi-cli, etc.

You could say -- "ok, users can also bring their own dandischema-ish library" too -- thus all data uploaded to their buckets would have uniformity -- however, abstraction at this time and level seems arduous without user research

@satra
Copy link
Member

satra commented Nov 3, 2024

Could you provide some base case abstractions you'd like to accomplish here?

i'm not sure what this means, but the process of uploading is the same for all instances (the data models are the same - to speak to the next comment). so the abstractions are really about where to send and do you have authorization to send there.

i don't think native cloud vendor CLIs (e.g. aws CLI) work for our use cases for various reasons (e.g. we need data model checks, we need validation, and we need to support for objects like zarr). also we don't expect our work to be completely general purpose. the focus would still be neuroscience domain for now, with the scope of the data model being modifiable. also our backend datamodel is relatively simple (blob/container blob + jsonld metadata), and intentionally different from the metadata models for dandiset/asset.

@aaronkanzer
Copy link
Member Author

aaronkanzer commented Nov 4, 2024

Could you provide some base case abstractions you'd like to accomplish here?

i'm not sure what this means, but the process of uploading is the same for all instances (the data models are the same - to speak to the next comment). so the abstractions are really about where to send and do you have authorization to send there.

i don't think native cloud vendor CLIs (e.g. aws CLI) work for our use cases for various reasons (e.g. we need data model checks, we need validation, and we need to support for objects like zarr). also we don't expect our work to be completely general purpose. the focus would still be neuroscience domain for now, with the scope of the data model being modifiable. also our backend datamodel is relatively simple (blob/container blob + jsonld metadata), and intentionally different from the metadata models for dandiset/asset.

@satra sounds good to me -- I do want to note the frequent usage of --skip-validation flag (Cc @kabilar) for the LINC project, as dandischema is very coupled to all-things DANDI -- are you envisioning all clones conforming to dandischema rules? Do you envision clones submitting PRs to dandischema (for more lenient schema validation) associated with the API that is routed to via dandi-cli?

with the scope of the data model being modifiable -- are you envisioning clones defining their own Django models? Are you envisioning clones leveraging JSON fields to conform to some of DANDI's current data model constructs?

@kabilar
Copy link
Member

kabilar commented Nov 4, 2024

Thanks team. Echoing the points above. For the use case of the LINC project, I think that the following behavior would be straightforward from the user perspective.

  1. A single client (i.e. dandi) that can interact with either the DANDI or LINC instance.
    dandi download -i linc <URL>
    
  2. Users would define the DANDI_API_KEY with the API key from https://dandiarchive.org or https://lincbrain.org. If we get feedback from LINC users that having a single environment variable for both the DANDI and LINC API keys is cumbersome then we can re-evaluate supporting a LINC_API_KEY.

Since dandischema changes are not currently needed for the LINC project, perhaps we can defer that discussion to the upcoming Generalization of DANDI meetings?

@aaronkanzer Your proposal on how to implement this feature sounds right (quoted below). Would you be able to submit a pull request for this work?

My proposal here would be to:

  1. Expand the DandiInstance object offerings to include LINC in consts.py -- code reference
  2. Add a is_private_archive boolean to the DandiInstance -- code reference. This bool value would be used to decide upon what commands in dandi-cli would require client.dandi_authenticate()

@satra
Copy link
Member

satra commented Nov 5, 2024

dandischema is very coupled to all-things DANDI

it is and it isn't. since dandischema supports the lightsheet + mri datasets on DANDI proper, it should also be fine for validating linc datasets. essentially any bids dataset should be supported by the DANDI schema and should upload without skip validation. if that's not the case, something is off and should highlight a reconciliation.

@kabilar
Copy link
Member

kabilar commented Nov 5, 2024

We will also have to update the DANDI and LINC docs. A few examples are below.

  1. Update the DANDI Client docs to explain that the DANDI Client can interact with the DANDI Archive and other instances.
    1. The `dandi <https://github.com/dandi/dandi-cli>`_ library provides both a
      command line interface (CLI) and a Python API for interacting with `DANDI
      Archive <https://dandiarchive.org>`_.
    2. A command-line client for interacting with the `DANDI Archive
      <http://dandiarchive.org>`_.
    3. This module provides functionality for interacting with a Dandi Archive server
    4. """A client to support interactions with DANDI archive (http://dandiarchive.org).
    5. description = Command line client for interaction with DANDI archive elements
    6. dandi-cli/README.md

      Lines 17 to 19 in 6aa414c

      * Interact with the DANDI archive's web API from Python
      * Delete data in the DANDI archive
      * Perform other auxiliary operations with data or the DANDI archive
  2. Update the DANDI Client docs to replace --dandi-instance with --instance.
    1. List known Dandi Archive instances that can be passed to the
      ``-i``/``--dandi-instance`` option of other subcommands for the CLI to
      interact with. Output is in YAML.
  3. Update the LINC web app example download commands to replace lincbrain with dandi.
    1. https://github.com/lincbrain/linc-archive/blob/66f867b3b18553b9b8311e0b5bc0e7e1f57ea778/web/src/views/DandisetLandingView/DownloadDialog.vue#L128-L135
    2. https://github.com/lincbrain/linc-archive/blob/66f867b3b18553b9b8311e0b5bc0e7e1f57ea778/web/src/components/FileBrowser/FileUploadInstructions.vue#L63
  4. Update the LINC docs to replace lincbrain-cli/lincbrain with dandi.
    1. https://docs.lincbrain.org/upload/
    2. https://docs.lincbrain.org/#about-this-doc

@kabilar
Copy link
Member

kabilar commented Nov 5, 2024

@kabilar
Copy link
Member

kabilar commented Nov 5, 2024

client.dandi_authenticate() is called upon every command in linc-cli due to the locked-down nature of LINC assets for now

Hi @aaronkanzer, I may have missed it but I didn't notice these calls in the diff pull request (lincbrain#63). We only added client.dandi_authenticate() for the lincbrain move command to fix a bug, but those changes are now also upstream in dandi/dandi-cli.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

4 participants