Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collections endpoint #386
base: develop
Are you sure you want to change the base?
Collections endpoint #386
Changes from 10 commits
120cf12
4adae80
d9312fe
abccba0
8308a50
9172374
38c46de
c7a00dd
e817790
614bc76
5e3dc8a
ad271ed
a60c4a7
8c2c7c8
f449350
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some concerns about pagination of
included
values, in the case of e.g., 100,000 structures in the same collection. Do we need to worry about that?included
is only an optional field at the moment.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ml-evs
included
is mentioned here only as a suggestion of the potential use of a collection as an export format - in which case the whole idea would be to put everything you want to export (e.g., all 100'000 structures) in the same stream. Nothing is said here that indicates mandatory support forincluded
?I thought we intended for clients to generally just get the list of ids from the relationship and then request entry data by further queries using the endpoint + id format. (Or, for efficiency when there are many, perhaps via
filter=id=X OR id=Y OR id=Z OR ...
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this is the intention, but I am a bit nervous that we have a field that can now grow unboundedly in a response if requested (or even if not, you cannot disable
relationships
from the response, I don't think?). I guess you could argue the same for a 1 million site structure object but here it feels well within the design that even the list of IDs could be very large.I think the larger comment now addressed in #420 would be the best mechanism around this (if we are going to support it anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about
included
orrelationships
now?; both of them can grow to arbitrary size - although - we are talking extremely large collections beforerelationships
becomes unduly large.Possibly repeating myself a bit, but to be clear: I don't see a problem with
included
. Implementations probably should avoid it except as recommended in OPTIMADE for references, unless a client somehow explicitly requests it (which we don't have a standard mechanism for yet). If an implementation decides to includeincluded
anyway, while simultaneously having "unboundedly large" relationships, it would be silly to not implement a limit on the number of entries included this way.The situation is more tricky with huge
relationships
. I think JSON:API silently is built on the assumption that the list of IDs for all relationships of a resource must be small enough to handle in a single request. Sure, one can use thearticles/1/relationships/comments
syntax to get something paginated, but how does one know in the first place which JSON:API relationship keys to fetch without first fetching the unboundedly largearticles/1
?Hence, I think we have to look at this list of IDs as a single "datum" where our default OPTIMADE JSON:API output format isn't equipped to handle arbitrary large data. This is echo:ed by our recommendation for other output formats to simply encode relationships alongside other data.
If we are concerned about this limitation, I don't see any other way to address it than to implement an alternative output format that can handle pagination on individual properties, including the relationships.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood
I think my concern is that some of the intended use cases for
collections
might cross this boundary already (would 10,000/100,000 IDs to structures that define a training set break this?) I'm also not sure thatrelationships
can be excluded from the request usingresponse_fields
, so you can't even hit/collections
to browse/filter them without getting these potentially large responses. I understand that this is already the case with e.g., COD's mythical 100k atom structure, but at least you could choose which fields you wanted to be returned!I'm leaning towards this being the correct approach. The relationships can be included as a
self
link toarticles/1/relationships/comments
rather than as adata
which I think solves your problem. Perhaps we could say something like. "It is RECOMMENDED that implementations useself
links instead of explicitrelationships
for collections with a number of entries in significantly in excess of the implementation's page limit."Let's see how the discussion in #419 goes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to be right - excluding the
data
key if it contains too many entries, and instead rely on alink
that returns a paginated result set of the entries is a straightforward solution. So, if we try to follow the JSON API examples, I think this link should point to, e.g.,:/collections/42/structures
and in reference to the ongoing discussion in #420 this would thus be the first "required" use of third-level URLs.