-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collections endpoint #386
Open
jvita
wants to merge
15
commits into
Materials-Consortia:develop
Choose a base branch
from
jvita:collections_endpoint
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Collections endpoint #386
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
120cf12
Update optimade.rst
jvita 4adae80
Test to see if edits work as expected
jvita d9312fe
Update optimade.rst
jvita abccba0
Update optimade.rst
jvita 8308a50
Minor edits and cleanup
jvita 9172374
Merge branch 'collections_endpoint' of github.com:jvita/OPTIMADE into…
jvita 38c46de
Merge branch 'develop' into collections_endpoint
rartino c7a00dd
Modified according to PR review comments, workshop discussions
rartino e817790
Changes due to review: drop "'contents' relationship"
rartino 614bc76
Merge branch 'develop' into collections_endpoint
rartino 5e3dc8a
Suggestions from @ml-evs in PR review
rartino ad271ed
Remove double-space
ml-evs a60c4a7
Merge branch 'develop' into collections_endpoint
rartino 8c2c7c8
Merge branch 'develop' into collections_endpoint
rartino f449350
Merge branch 'develop' into collections_endpoint
rartino File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some concerns about pagination of
included
values, in the case of e.g., 100,000 structures in the same collection. Do we need to worry about that?included
is only an optional field at the moment.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ml-evs
included
is mentioned here only as a suggestion of the potential use of a collection as an export format - in which case the whole idea would be to put everything you want to export (e.g., all 100'000 structures) in the same stream. Nothing is said here that indicates mandatory support forincluded
?I thought we intended for clients to generally just get the list of ids from the relationship and then request entry data by further queries using the endpoint + id format. (Or, for efficiency when there are many, perhaps via
filter=id=X OR id=Y OR id=Z OR ...
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this is the intention, but I am a bit nervous that we have a field that can now grow unboundedly in a response if requested (or even if not, you cannot disable
relationships
from the response, I don't think?). I guess you could argue the same for a 1 million site structure object but here it feels well within the design that even the list of IDs could be very large.I think the larger comment now addressed in #420 would be the best mechanism around this (if we are going to support it anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about
included
orrelationships
now?; both of them can grow to arbitrary size - although - we are talking extremely large collections beforerelationships
becomes unduly large.Possibly repeating myself a bit, but to be clear: I don't see a problem with
included
. Implementations probably should avoid it except as recommended in OPTIMADE for references, unless a client somehow explicitly requests it (which we don't have a standard mechanism for yet). If an implementation decides to includeincluded
anyway, while simultaneously having "unboundedly large" relationships, it would be silly to not implement a limit on the number of entries included this way.The situation is more tricky with huge
relationships
. I think JSON:API silently is built on the assumption that the list of IDs for all relationships of a resource must be small enough to handle in a single request. Sure, one can use thearticles/1/relationships/comments
syntax to get something paginated, but how does one know in the first place which JSON:API relationship keys to fetch without first fetching the unboundedly largearticles/1
?Hence, I think we have to look at this list of IDs as a single "datum" where our default OPTIMADE JSON:API output format isn't equipped to handle arbitrary large data. This is echo:ed by our recommendation for other output formats to simply encode relationships alongside other data.
If we are concerned about this limitation, I don't see any other way to address it than to implement an alternative output format that can handle pagination on individual properties, including the relationships.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood
I think my concern is that some of the intended use cases for
collections
might cross this boundary already (would 10,000/100,000 IDs to structures that define a training set break this?) I'm also not sure thatrelationships
can be excluded from the request usingresponse_fields
, so you can't even hit/collections
to browse/filter them without getting these potentially large responses. I understand that this is already the case with e.g., COD's mythical 100k atom structure, but at least you could choose which fields you wanted to be returned!I'm leaning towards this being the correct approach. The relationships can be included as a
self
link toarticles/1/relationships/comments
rather than as adata
which I think solves your problem. Perhaps we could say something like. "It is RECOMMENDED that implementations useself
links instead of explicitrelationships
for collections with a number of entries in significantly in excess of the implementation's page limit."Let's see how the discussion in #419 goes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to be right - excluding the
data
key if it contains too many entries, and instead rely on alink
that returns a paginated result set of the entries is a straightforward solution. So, if we try to follow the JSON API examples, I think this link should point to, e.g.,:/collections/42/structures
and in reference to the ongoing discussion in #420 this would thus be the first "required" use of third-level URLs.