Add ranged properties #452

JPBergsma · 2023-01-09T17:32:23Z

In This PR, I make a suggestion for how large “ranged” properties could be shared with OPTIMADE.
See the discussion in Issue#428
It also describes how this could be done for multidimensional properties.

I have now placed the main body of the text in section 3.9, I am however not sure if this is the best location. Feel free to suggest another location in the text.

~~I have now used a prefix to indicate that a property is a ranged property. This however creates a precedent, as it is the first time we use a prefix in this manner.~~ If you prefer, I can also add a field that lists all the properties that are ranged properties, or mandate that this is specified at the info endpoint.

Do we want to add a field in the property definition that a field is a ranged field, (also in case we choose to use a ~~prefix or a~~ separate property in the question before this one)?

Now I have put a SHOULD level queryability not defined the queryability of all the dictionary fields except the values and indexes field.
I do not think it is very important that these fields are queryable, so we could perhaps also make querying OPTIONAL.( I have not set all query ability to OPTIONAL)

Now the values can only depend linearly on the index. Do we want to define a more general field which could contain information on how to calculate the value from the index? This would be considerably more complex and probably warrants a PR of its own.

~~Do we want to have a negative requirement on the range_id, i.e. If the range_id is not defined, a property must not have the same range as another property?~~

For now, I removed properties average, min, max, and set as this becomes more complicated with multiple dimensions and I did not want to make this PR too big / complicated.
This can be addressed in a separate PR once this one gets merged.

Do we want to make the values for slicing optional?
The default values would then be [1,last,1]
So [] would return all the values, [100] would return all the values from the hundredth one onward and [30,50] would return every, value belonging to index 30 till 50.

Do we want to support multiple ranges for a single property? e.g. _ranged_property_name[1,100,2][101,200,1]

…e complicated when multiple dimensions are involved.

rartino

Great to see a PR of this, thanks for the work with writing this up.

I find the formatting a bit strange. Most of the text is in section 3 for general API conventions, where all other sections just write out explanations of API features in regular text. This PR adds something formatted as a property definition, even though it is not. And the list of fields is also not formatted as similar lists elsewhere in the specification. I try to explain in my review comments, but perhaps also look at how the other sections explaining similar features elsewhere in the specification is formatted?

Another question is whether we should hold this back until v1.2 is out. I lean towards "yes". Any thoughts @ml-evs?

optimade.rst

rartino · 2023-01-11T14:35:23Z

optimade.rst

+  Likewise, the server can use paging to return the property in multiple parts.
+  This can be useful for properties that are so large that it can be inconvenient to return them in a single response.
+  In that case a link is provided, as described in `JSON Response Schema: Common Fields`_ under the `links.next` field, from which the remainder of the requested data can be retrieved.


I do not understand how paging works on ranged properties. The JSON:API field links.next typically pages in lists of entries when the client fetch more than one entry, e.g., /structures. Is the idea that you can only page a ranged property this way when requesting a single entry, e.g., /structures/142? I think this may conflict with JSON:API's definition of links.next. At least it needs to be described better how this works and what limitations apply.

My suggestion is to, at this point, not allow paging of range properties, only retrieval of ranges via OPTIMADE-specific query parameters.

A single entry with the data of a trajectory can easily be gigabytes in size.
Returning all that data in one response would give a long response time.
So we need a way for the server to limit the size of a response. (Perhaps we could create some kind of streaming response, but I am not sure how the different JSON tools will handle this. We may need to create our own JSON parser.)

The server can put an extra field in the next link that specifies to which extent the different properties have already been sent. They can slice the property in the same way the client does.
There is, however, no need to define how servers should do this as this only applies to their own server, so they can do what they want as long as the client gets the correct data.

I could add a field to the response to make it clearer what the server has returned. e.g. which range of values has been returned, so it is easier for the client to reconstruct the entry.

We could add a field with a name different from next to the links section, but I am not sure if that is any clearer.

Maybe paging is not the right word to use here. I will adjust the description.

Sorry if I was unclear: I did not mean to question that range properties need to be fetched in parts.

I suppose you can refer to any mechanism for fetching partial data as "paging". However, in JSON:API paging (or actually, the specification uses "pagination") via the top-level "links object" is described as for servers that choose to "limit the number of resources returned" when returning a collection of resources as a result. I do not see a JSON:API-conforming way to "extend" that top-level links object the way I think you suggest (?)

Hence, I thought we were going to only allow fetching ranged data in parts via query parameters specifying ranges. However, I think you are right that some form of JSON:API links-object type paging would be better as the MANDATORY baseline mechanism. That way all the new query parameters for ranged data can be made OPTIONAL, which is perfect for us who want to support MD data but have no use case for arbitrary indexing into frames (and thus do not want to put any effort into supporting that).

So, we just need to specify a place for that range-specific a JSON:API links object. But, where? Again, I don't think it can go in the top-level links object. So, what about inside the ranged property metadata dictionary?

I think it would be a bit strange to place the links in the ranged property metadata dictionary. So far I considered the trajectory as one resource, so I would expect that the next link applies to the whole resource and not just one property. So I would suggest placing the link a level higher at the level of the entire trajectory resource.

I am not sure why you suggested placing the link with the individual property.
I guess it is possible to turn the ranged properties into resources on their own. In that case the trajectory entry would include links to these resources like we do for the references.
As long as the server does not support the Include parameter, this could work as well. In that case, the client would need to retrieve the data for each individual property with a separate request. And if a property is too large, we could place the links to retrieve the rest of the property within the property attributes.

I do not think the new query parameter can be OPTIONAL. You still need a way to indicate which ranged_properties need to be returned.

optimade.rst

@rartino

Added suggestion from @rartino Co-authored-by: Rickard Armiento <[email protected]>

…properties

optimade.rst

rartino · 2023-01-13T09:28:15Z

optimade.rst

+- **Type**: dictionary with keys:
+
+  - :property:`serialization_format`: string (REQUIRED)
+  - :property:`values`: List of any data type (OPTIONAL)


So, do I understand correctly that the present design is for the member values in the "metadata dictionary" to also contain the actual data values? (So, in a sense, "metadata" is mislabeling it?)

So, correlated with the thoughts above about pagination: how about making responses always contain two fields for ranged properties, one metadata field with only metadata, and one for the data. That way, even the initial response that contain a ranged property can pass the first page of data the way it would if this was just a regular (non-ranged) property. This means that if the ranged property happens to be small, the client gets all the data it needs in the first response without having to do anything else. Only in the case all data does not fit in one response does the client need to engage with the specifics of accessing ranged properties. This will actually greatly help collections (#386): I felt we were going to end up with an ugly design to force clients to fiddle with range access for all collections, when in many cases of smaller collections this would be unnecessary.

So, I propose:

Split out values into its own field; i.e., edit the initial text to say that for ranged properties two fields are returned:
<property_name>_ranged_meta: a dictionary with all these metadata fields.
<property_name>_ranged_data: a list of values of any type.
The server MAY support querying on either or both.

For the field <fieldname>_ranged_meta all the fields you defined here are kept with the same meaning, but two new ones are needed (and perhaps these should be the only ones MANDATORY):

'links': a JSON:API object on the format described in the JSON:API specification section "Pagination", which is used for pagination of the ranged property.

'more_data_available': boolean. True if the data does not fit in a single page and this is not the final page.

The field <fieldname>_ranged_data would then be defined as:

If no range-selecting query parameters are given, it contains the first page of data (which is all of the data if and only if more_data_available is false in <fieldname>_ranged_meta. If <fieldname>_ranged_meta is true, then further pages of data can be fetched by the client via the URL in next in <fieldname>_ranged_meta.

If any of the range-selecting query parameters defined below are given, then data matching that range selection is returned. The server MAY support paging also of the requested range, or MAY return an error if the requested range is too big to fit in a single response.

So, do I understand correctly that the present design is for the member values in the "metadata dictionary" to also contain the actual data values? (So, in a sense, "metadata" is mislabeling it?)

Yes, the dictionary would also contain data when the property is listed within the property_range query parameter so the term "metadata dictionary" is misleading.

how about making responses always contain two fields for ranged properties, one metadata field with only metadata, and one for the data.

I do like that all the data is grouped together in a single field. We could also make a separate sub dictionary to group the metadata separate from the data. But overall, it is not that important how the fields are organized.

That way, even the initial response that contain a ranged property can pass the first page of data the way it would if this was just a regular (non-ranged) property.

I am not sure, I like your proposal to return the first “page” of data of a ranged property by default.
The first request will typically be for a client to check which entries are present. It can then select the most interesting entries. And retrieve the additional data for these entries.
With your proposal, the number of entries that a server can return in a single response would be much smaller, as most of the response would consist of the first “pages” of properties.

If a user wants the behaviour you described, they could simply use the property_range query parameter to specify that they want the first x values of a property returned.

If a client would retrieve only a single property, it would make more sense to already provide some of the data, but having a different behaviour for the single entry endpoint could also be confusing/complicate things.

I felt we were going to end up with an ugly design to force clients to fiddle with range access for all collections, when in many cases of smaller collections this would be unnecessary.

Another solution could perhaps be to define the collection field flexibly so that you could have a regular and a ranged version of this property. I already wrote that if the part after the ranged prefix is the name of a regular property, it should meet all the requirements of that property.

'links': a JSON:API object on the format described in the JSON:API specification section "Pagination", which is used for pagination of the ranged property.

In the JSON API v1.0 links is a reserved key word, so we would have to give this field a different name. (or drop support for json API 1.0) see section 5.2.3 of the JSON API specification.

It is quite a big change you are suggesting. Instead of retrieving the data per entry, you want to retrieve data per property. A bit as if each property is a resource in itself.

If any of the range-selecting query parameters defined below are given, then data matching that range selection is returned. The server MAY support paging also of the requested range, or MAY return an error if the requested range is too big to fit in a single response.

I do not think returning errors to the user is very nice when the user can’t know in advance that something is wrong.
So we should somehow specify how much data the server can return, and then the client should construct a request that fits between these bounds.
But this is difficult as the size of the properties differs from entry to entry and is unknown to the client.
So I do think that the server should implement pagination as it has all the data needed to do so while the client has not.

We should probably also think about creating streaming responses, so sending back large data items is less odd to the client. Now we first prepare the response and then send it. But for large responses this would be odd as the user does not get any data back for a while. If we do this, we can send larger responses, and we would not need ranged properties as often. But this is probably more of an issue for the optimade python tools rather than a change in the optimade specification.

Sara and Gian-Marco have asked me to prioritize writing a tutorial for the old trajectory proposal, so it may take a while before I get back to this topic.

Yes, the dictionary would also contain data when the property is listed within the property_range query parameter so the term "metadata dictionary" is misleading.

Where does the text of this PR presently say this, i.e., that the behavior of value depends on the query parameter property_range? What I see in the text is: "The property values MUST be present when serialization_format is not set to "linear"."

I do like that all the data is grouped together in a single field. We could also make a separate sub dictionary to group the metadata separate from the data. But overall, it is not that important how the fields are organized.

I think there is a point in separating data and metadata, especially when we consider the possibility of searching for data. But I agree that whether the data goes inside or outside the dictionary is not the most crucial design decision. Perhaps we can discuss it on the web meeting.

I am not sure, I like your proposal to return the first “page” of data of a ranged property by default. The first request will typically be for a client to check which entries are present. It can then select the most interesting entries. And retrieve the additional data for these entries.
With your proposal, the number of entries that a server can return in a single response would be much smaller, as most of the response would consist of the first “pages” of properties.

The first page doesn't need to be very large, it is up to the implementation - it could be, say, 20 values. If the client wants the data it asks for the field (and should then be happy to also get the data), if the client does not want the data it should not ask for the field.

If a user wants the behaviour you described, they could simply use the property_range query parameter to specify that they want the first x values of a property returned.

I'm trying to move this in the direction where implementations that do not support random access of the data does not have to implement property_range at all, which is fairly complex to parse the way it is currently defined.

I hadn't even considered that the way you've written this is meant to support a range selection over every entry that matches a filter. I do not think implementations should be forced to support that even if they do support range selection over a single entry.

In the JSON API v1.0 links is a reserved key word, so we would have to give this field a different name. (or drop support for json API 1.0) see section 5.2.3 of the JSON API specification.

Can you cite the part of the JSON:API specification that you read to say one cannot have links as a key in a dictionary inside attributes?

It is quite a big change you are suggesting. Instead of retrieving the data per entry, you want to retrieve data per property. A bit as if each property is a resource in itself.

No, that is not what I meant, sorry if I was unclear. This was meant to be a small adjustment in behavior - the URL in links is meant to give you a single entry response with the next page of data for every property using the same range id as the one your are paging in. You can implement it by a URL that adds "property_ranges" selecting the next range (but you don't have to). I see now that I had missed how you want to allow multidimensional indexing with multiple range_ids; one would in that case have to work out how paging works in the multidimensional case. (But, do we need to support multidimensional ranges?)

So we should somehow specify how much data the server can return, and then the client should construct a request that fits between these bounds.

Well, as you also conclude, that cannot generally be done given the degrees of freedom supported by serialization_format. So, sure, if you want to specify that IF property_range is supported and the client asks for more results than the server can/wants to return, then the server also MUST allow paging those ranged responses with the paging mechanism I proposed, I'd be ok with that. As long as there is an option not to support property_range at all.

Where does the text of this PR presently say this, i.e., that the behavior of value depends on the query parameter.

On line 526/527 it is mentioned that the values are only returned when the property is listed in the property_ranges query parameter.
I have added an extra line to the description of the values field to make it a bit clearer.

I'm trying to move this in the direction where implementations that do not support random access of the data does not have to implement property_range at all, which is fairly complex to parse the way it is currently defined.

In that case, the client would still need a way to specify which fields it wants to have. So I think you would still need the property_range query parameter.
How would you make it easier to parse ?

I hadn't even considered that the way you've written this is meant to support a range selection over every entry that matches a filter. I do not think implementations should be forced to support that even if they do support range selection over a single entry.

Most of the code for the single entry endpoint and the multiple entry endpoint is the same. So I do not think this makes the implementation more difficult. It may actually be more work to create different behaviour for the single and multiple entry points.

Can you cite the part of the JSON:API specification that you read to say one cannot have links as a key in a dictionary inside attributes?

“Complex data structures involving JSON objects and arrays are allowed as attribute values. However, any object that constitutes or is contained in an attribute MUST NOT contain a relationships or links member, as those members are reserved by this specification for future use.”

No, that is not what I meant, sorry if I was unclear. This was meant to be a small adjustment in behaviour - the URL in links is meant to give you a single entry response with the next page of data for every property using the same range id as the one your are paging in. You can implement it by a URL that adds "property_ranges" selecting the next range (but you don't have to).

So, if I understand it correctly, you want to give each resource/entry its own next link to retrieve the remainder of this entry, and use the standard JSON API next link to point to the next entry that matches the filter.
I do not use the range_id for retrieving the data.
You use the name of the field in the query parameter property_ranges. So you can define a separate range for each property.

I see now that I had missed how you want to allow multidimensional indexing with multiple range_ids; one would in that case have to work out how paging works in the multidimensional case. (But, do we need to support multidimensional ranges?)

The current proposal already supports paging in multiple dimensions. (The range_ids have nothing to do with indexing. It only shows that two properties use the same range, so that values with the same index belong to each other.)
I mostly started out thinking how I could make sure that the proposal was compatible with a future extension to multiple dimensions. And when I had figured that out, I thought I could just as well write it out in the specification.
I could leave out the references to multiple dimensions, although in that case some people may wonder why there are some extra pairs of square brackets needed.
Some people have also asked me whether it would be possible to retrieve the cartesian_site_positions only for specific atoms. The current specification does not yet allow that, but I think the property_ranges query parameter could be extended to allow specific indexes. We could, for example, make a convention that the list that now contains the first last and step parameters could be replaced with a list of the indexes, placed between “(“ and “)”, for which data should be retrieved.

I'm trying to move this in the direction where implementations that do not support random access of the data does not have to implement property_range at all, which is fairly complex to parse the way it is currently defined.

In that case, the client would still need a way to specify which fields it wants to have. So I think you would still need the property_range query parameter.

I don't get what you mean here, sorry. Which fields to return in a response is requested via response_fields?

How would you make it easier to parse?

I'm trying to propose to make it so that the single mandatory feature to get all data in a ranged property is via a next-like paging system, which wouldn't require interpreting property_range at all.

Can you cite the part of the JSON:API specification that you read to say one cannot have links as a key in a dictionary inside attributes?

“Complex data structures involving JSON objects and arrays are allowed as attribute values. However, any object that constitutes or is contained in an attribute MUST NOT contain a relationships or links member, as those members are reserved by this specification for future use.”

Oh, wow, I had missed this completely. It is a really strange rule to have for attribute data at arbitrary depth, and it seems they removed it in v1.1. Well, ok then, you are right that we would have to name that inner field something different than links. Perhaps just place a URL in a member named next.

So, if I understand it correctly, you want to give each resource/entry its own next link to retrieve the remainder of this entry, and use the standard JSON API next link to point to the next entry that matches the filter. I do not use the range_id for retrieving the data. You use the name of the field in the query parameter property_ranges. So you can define a separate range for each property.

No, the way you describe it does not sound right. Lets me try again.

The standard JSON API next link would work as it always has: it pages over a range of returned entries. This is an absolute necessity to adhere to JSON:API.

A ranged property would, among its metadata, communicate a "special" next link. Following that next link would give you that entry (that whole entry with all attributes/properties as requested by response_fields) but where the "data" part of the ranged property contains the next range of values. This is the most straightforward way (and thus easiest to implement) I can imagine to communicate paged ranged data values within a fully valid OPTIMADE/JSON:API response.

Maybe the most clear way to express it is that this "feature" is meant to be completely compatible with your design of property_ranges. Someone who has implemented property_ranges gets precisely the right thing by simply constructing that special next link by adding a property_ranges parameter that specifies the next page of data.

Some people have also asked me whether it would be possible to retrieve the cartesian_site_positions only for specific atoms. The current specification does not yet allow that, but I think the property_ranges query parameter could be extended to allow specific indexes.

Why would this extension belong in property_ranges? If so, one could only use it for ranged properties? It seems to me it belongs in response_fields or as its own query parameter, where one could imagine various ways to extend it to ask for specific subsets of a property.

I don't get what you mean here, sorry. Which fields to return in a response is requested via response_fields?

In the discussion on the trajectory endpoint, it was concluded that it would be confusing to determine whether the values should be returned based on whether the property was present in the response_fields. If the whole property (metadata + data) is present under a single field, there would be no way to specify with the response_fields parameter that you only want the metadata.
So instead we decided to trigger returning the data by the presence of the query parameters first_frame, last_frame and frame_step.
It therefore made sense to me to do the same here with the property_ranges query parameter.

I now understand why you wanted to split the data and metadata into two separate fields. That way, you could add the data field to the response_fields to get the data. And not return it otherwise.
If we however add by default a next link to the metadata, it would not be necessary to have a separate field for the data.

No, the way you describe it does not sound right. Let me try again.
The standard JSON API next link would work as it always has: it pages over a range of returned entries. This is an absolute necessity to adhere to JSON:API.
A ranged property would, among its metadata, communicate a "special" next link. Following that next link would give you that entry (that whole entry with all attributes/properties as requested by response_fields) but where the "data" part of the ranged property contains the next range of values. This is the most straightforward way (and thus easiest to implement) I can imagine to communicate paged ranged data values within a fully valid OPTIMADE/JSON:API response.
Maybe the most clear way to express it is that this "feature" is meant to be completely compatible with your design of property_ranges. Someone who has implemented property_ranges gets precisely the right thing by simply constructing that special next link by adding a property_ranges parameter that specifies the next page of data.

Ok, I think I understand you now. I’ll adjust in the proposal.

Why would this extension belong in property_ranges? If so, one could only use it for ranged properties? It seems to me it belongs in response_fields or as its own query parameter, where one could imagine various ways to extend it to ask for specific subsets of a property.

The cartesian_site_positions field is an example of multidimensional data.
In a trajectory the first dimension would cover the frames, the second dimension the particles and the third the dimensions(x, y, z).
In a PDB file, the atoms belonging to the protein usually come first, while the solvent atoms come later. So one could make a query, selecting only a part of the range in the second dimension, so that only the positions of the atoms of the protein and not those of the solvent would be returned.

Co-authored-by: Antanas Vaitkus <[email protected]>

…JPBergsma/OPTIMADE into JPBergsma/Add_ranged_properties # Conflicts: # optimade.rst

optimade.rst

rartino · 2023-02-17T08:53:55Z

optimade.rst

+  The third value specifies the step size.
+  A list consists of a pair of square brackets ("[", ASCII 91(0x5B)) and ("]", ASCII 93(0x5D)) enclosing a number of values separated by a comma (",", ASCII 91(0x5B))
+  Ranges can be specified for multiple properties by separating them with a comma.
+  Databases MUST return the :property:`values` and :property:`indexes` field belonging to properties listed and SHOULD use the ranges in this query parameter.


Some reformulation for clarity:

Suggested change

Databases MUST return the :property:`values` and :property:`indexes` field belonging to properties listed and SHOULD use the ranges in this query parameter.

When a client includes `property_ranges` in a request, the response MUST include the :property:`values` and :property:`indexes` field belonging to properties listed and SHOULD use the ranges in this query parameter.

but the SHOULD here makes me nervous. I think supporting this parameter should be optional, but if you support it, don't you have to follow the indexing request?

optimade.rst

Co-authored-by: Rickard Armiento <[email protected]>

rartino · 2023-02-21T13:06:13Z

I apologize for feature-creep on already long on-going work; but, after some more thinking I realize we have a number of properties already defined that would benefit from the ability to be "ranged", not the least, cartesian_site_positions. I also suspect we will end up arguing a lot about in the future about which properties to "range" and which not to.

I think with some, actually quite minor, adjustment of this PR we can allow any property to be ranged on an entry-by-entry basis. If people agree this would be useful, I think this is how it could be done:

When property data easily fits inside the response (i.e., what we always do now), the response should look exactly the same as it does now; so any entry that can be represented in OPTIMADE today remains unchanged.
When some property does not fit, we do not want clients written according to the present specification to break, so the property field should in that case be null in the response. Clients that are not aware of "ranged properties" would hence conclude that the data is missing (which, in a way, it is: it is too big to represent using the standard the client uses).
In the latter case, the implementation also includes the <property definition>_ranged field just as already is described in this PR that enables clients aware of ranged properties to page in the response.

The above seems to me as a minor change to the PR with major utility going forward.

rartino

A few comment (submitting because I've started to get errors about that other changes have been made)

optimade.rst

rartino · 2023-03-09T17:05:30Z

optimade.rst

@@ -455,15 +455,14 @@ Ranged properties are used for properties that are too large to be returned by d
 - **Requirements/Conventions**:

  - **Support**: OPTIONAL support in implementations.
-  - A ranged property can be recognized by the presence of the :field:`range` in the metadata of the property, i.e. in the field: <property_name>_meta.
+  - A ranged property can be recognized by the presence of the :field:`range` in the metadata of the property, i.e. in the field: :field:`<property_name>_meta`.


If all fields are technically "ranged", I think the differentiation here has to be whether there is something along a field ranged/more_data_available present and true, meaning not all data is communicated in the current response. It must be allowed to have a ranged dictionary even if that is not true.

Well, I guess you can make all fields ranged by setting the number of indexable dimensions to zero and return the whole property in one go, so the field behaves the same as the current fields. So you could still add a ranged subfield to the metadata even if a property does not support any of the new features.

There is already a more_data_available field that can be used to indicate whether the server has retuned all the data.
By setting the number of indexable_dimensions to 0, you as a server could also indicate to the client that it is not possible to perform slicing on the property. But perhaps it is cleaner to add a boolean to the metadata to define whether the supports that the client can request a subrange to retrieve.

Does this answer your question/remark ?

rartino · 2023-03-09T17:15:30Z

optimade.rst

+  A list with an identifier for each dimension of the property. If two properties have the same range_id for a dimension it means that the values at an index along this dimension belong to each other.
+  For example, if both the :property:`energy` and :property:`cartesian_site_positions` of a trajectory have the same :field:`range_ids` it indicates that the energy at an index x(in the dimension labeled by this range_ids) belongs to the cartesian_site_positions at the same index x.


Right, I understand now what you meant with "correlated" a few edits ago. I think we need to find a term for this, but one that hopefully is more clear. Maybe "synchronized"?

But, is your idea that one can get different ranges from different properties even for dimensions that are "synchronized"? I'd propose that one cannot. If asking ranges by range_id, all "syncronized" quantities would indeed be synchronized. (But otherwise "syncronized" is maybe not the best term)

I'm honestly not sure if my version is better - consider it a suggestion that you can accept or rework into your version for clarity.

Suggested change

A list with an identifier for each dimension of the property. If two properties have the same range_id for a dimension it means that the values at an index along this dimension belong to each other.

For example, if both the :property:`energy` and :property:`cartesian_site_positions` of a trajectory have the same :field:`range_ids` it indicates that the energy at an index x(in the dimension labeled by this range_ids) belongs to the cartesian_site_positions at the same index x.

A list with an identifier for each dimension of the property.

If any dimension in two or more properties share the same :field:`range_id` those dimensions should be thought of as the same dimension.

For example, if both the :property:`energy` and :property:`cartesian_site_positions` of a molecular dynamics trajectory share a :field:`range_id` of :val:`frame`, it should be interpreted as them providing, for each frame, synchronized values of the coordinates and the energy.

To me, synchronous gives the impression that both are happening at the same time.
It however also applies to cases where the dimension is something other than time, so It is probably not the best word.

I am not entirely sure though what you mean with:

But, is your idea that one can get different ranges from different properties even for dimensions that are "synchronized"? I'd propose that one cannot. If asking ranges by range_id, all "syncronized" quantities would indeed be synchronized. (But otherwise "syncronized" is maybe not the best term)

If properties have the same range_id in a dimension, the values for those properties at the same index should belong to each other.

rartino · 2023-03-09T17:42:49Z

optimade.rst

+- :field:`range_ids`: list of strings.
+
+  A list with an identifier for each dimension of the property.
+  If two properties have the same range_id for a dimension, it means that the values at an index along this dimension belong to each other.
+  For example, if both the :property:`energy` and :property:`cartesian_site_positions` of a trajectory have the same :field:`range_ids` it indicates that the energy at an index x(in the dimension labelled by this range_ids) belongs to the cartesian_site_positions at the same index x.
+  SHOULD be a queryable property with support for all mandatory filter features.
+
+If the :field:`<property_name>` contains data, the following properties MUST be present or SHOULD NOT be present, depending on the value of the :property:`serialization_format`. Querying is not relevant for these properties and SHOULD NOT be supported.


Should range_id really be a per-entry metadata? Is that perhaps something we define as part of the property definition?

I have not yet been able to come up with an example where defining range_id in the property definition would cause more problems than at the single entry level. So I think it should be possible to define it in the property definition.

…with nelements.

JPBergsma · 2023-03-28T16:56:43Z

I have updated the specification with the points we discussed during the online meeting.(with @rartino)
Except for the point you mentioned about the next link not being mandatory for properties with more than 1 dimension.
As I do not think it will be more difficult for providers to implement this than it would be for clients to handle the exception of the next field not being there.

I now use parentheses instead of square brackets as parethesis are already mentioned in the Grammar (I do not really know anything about this grammar so I do not know how hard or easy it is to make changes.)

I have added a per entry next field as we no longer specify ranges per property but per dimension.

The text can probably still be fine tuned a bit more. but for now I would be happy if we can agree on the specification sufficiently well that it could be implemented.

rartino

Thanks for all the work put into this.

I am generally very supportive of this version. I especially like how the property range parameter is much more readable now.

But, I would:

Appreciate someone else also takes a careful look to see if the present version makes sense also to them (since this is a result of you and I discussing).
It would be great if someone implement at least a minimal subset of this before it is merged into a version of OPTIMADE we release, since I expect little details on how things are defined may be much more clear then. I'm certainty noticing that effect when working with the property definitions now.

optimade.rst

rartino · 2023-04-27T12:43:59Z

optimade.rst

+  The total number of values in the property.
+  SHOULD be a queryable property with support for all mandatory filter features.
+
+- :field:`next`: a `JSON API links object <http://jsonapi.org/format/1.0/#document-links>`__.


We've had this trouble elsewhere in the specification and keep running into it. The reference to a JSON API links object in a section that is meant to be data-format agnostic doesn't really work. If someone is trying to represent the output data in HDF5 (apparently someone already implemented that...), what does a "JSON API links object" mean inside HDF5?

So, the easy solution (suggested below) is to skip the extra degrees of freedom provided by JSON API links objects when we have links appearing in data-agnostic output, and just say that it is a string with a URL. However, given how often we are running into this issue, I'm starting to wonder if we rather should extend the base OPTIMADE data types with URL, so that the JSON API output format section generally can say that when the OPTIMADE specification says that something is "a link", you are meant to put a JSON API links object there.

Suggested change

- :field:`next`: a `JSON API links object <http://jsonapi.org/format/1.0/#document-links>`__.

- :field:`next`: String.

I think we define URLs in many places as a JSON API links object, like the home page field for providers in the meta field, for example. So it seems logical to do it here as well. I will change the description a bit, so it is more like the others.

I however do think that the values we assign are often not JSON API links objects. But instead, values that would qualify as members for JSON API links objects. The value of such a member can also be a string, so it should not be any more problematic to store as the dictionaries we currently have.

Should I make a separate PR to change JSON API links objects to something like a valid member for a JSON API links object elsewhere in the specification ?

Ps. I have written code which outputs the response in the hdf5 format. You can test it with https://optimade-trajectory-demo.matcloud.xyz/v1/structures?response_format=hdf5. Unfortunately, h5py does not support storing a list of dicts, so I have to convert this to a dict of dicts.

optimade.rst

rartino · 2023-04-27T12:49:50Z

optimade.rst

+
+In addition to these fields in the metadata, entries which support accessing data via the :query-param:`property\_ranges` query parameter SHOULD support per entry :field:`next` and :field:`more_data_availble` field to enable returning all the
+
+- :field:`next`: a `JSON API links object <http://jsonapi.org/format/1.0/#document-links>`__.


Suggested change

- :field:`next`: a `JSON API links object <http://jsonapi.org/format/1.0/#document-links>`__.

- :field:`next`: String.

optimade.rst

Co-authored-by: Rickard Armiento <[email protected]>

JPBergsma · 2023-05-03T16:46:05Z

I guess I should implement the meta property first, although that should not be that much work.
I'll ask Sara and Gian-Marco whether I can prioritize creating an implementation of the ranged properties.

JPBergsma · 2023-05-24T18:15:54Z

optimade.rst

+  - :field:`stop`: integer.
+    The index of the last value of this property in this dimension.
+
+- :field:`contains_null`: boolean


Should "contains_null" not be a more generic metadata property that could also apply to other non-ranged fields ?
Ps. This would also allow querying on whether a property contains_null, now that I have placed in the description that support for queries on the fields under "range" is optional.

…of ranged properties is Optional.

…planation for property_ranges query parameter.

JPBergsma · 2023-08-25T13:46:02Z

The adding of the partial_data mechanism PR#467 has made this PR largely obsolete, so I will therefore close it for now.

JPBergsma added 6 commits December 31, 2022 20:58

Intermediate state adding ranged properties.

c18cf0c

Merge branch 'develop' into JPBergsma/Add_ranged_properties

7dc50f2

first draft for ranged properties.

6afcf9c

Merge branch 'develop' into JPBergsma/Add_ranged_properties

1a55c2a

Removed average, set, min and max fields for now as these become quit…

b597a67

…e complicated when multiple dimensions are involved.

Small corrections.

37db878

JPBergsma linked an issue Jan 11, 2023 that may be closed by this pull request

Sharing large data objects #428

Closed

JPBergsma added type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus. PR/major-edits PR/requires-discussion labels Jan 11, 2023

Added how to treat missing values for requested range.

a832751

rartino requested changes Jan 11, 2023

View reviewed changes

JPBergsma and others added 3 commits January 12, 2023 12:44

changed description field

16aba07

Apply suggestions from code review

b1d69a8

Added suggestion from @rartino Co-authored-by: Rickard Armiento <[email protected]>

Merge branch 'Materials-Consortia:develop' into JPBergsma/Add_ranged_…

edbfc25

…properties

vaitkus reviewed Jan 12, 2023

View reviewed changes

rartino reviewed Jan 13, 2023

View reviewed changes

JPBergsma and others added 7 commits January 17, 2023 15:11

Apply suggestions from code review Vaitkus

14de45d

Co-authored-by: Antanas Vaitkus <[email protected]>

intermediate state from implementing code review.

906db81

Merge branch 'JPBergsma/Add_ranged_properties' of https://github.com/…

1feb4a9

…JPBergsma/OPTIMADE into JPBergsma/Add_ranged_properties # Conflicts: # optimade.rst

Processed comments rartino.

a96dffe

Small corrections.

15f599c

Added returned range property.

73905dc

Added extra explanation values field.

c6834f3

rartino reviewed Feb 17, 2023

View reviewed changes

optimade.rst Outdated Show resolved Hide resolved

rartino reviewed Feb 17, 2023

View reviewed changes

optimade.rst Outdated Show resolved Hide resolved

rartino reviewed Feb 17, 2023

View reviewed changes

optimade.rst Outdated Show resolved Hide resolved

Apply suggestions from code review

b0cc94c

Co-authored-by: Rickard Armiento <[email protected]>

JPBergsma force-pushed the JPBergsma/Add_ranged_properties branch from 62c6686 to 139c70e Compare March 9, 2023 17:32

rartino reviewed Mar 9, 2023

View reviewed changes

JPBergsma added 5 commits March 13, 2023 12:15

Changed 'n_' to 'n' for ranged metadata properties tio be consistent …

916d6f2

…with nelements.

Changed wording of range_id field after suggestion Rickard.

ee3651e

placed subsequent sentences on seperate lines.

1f794b6

Processed points discussed with Rickard.

169a1f4

Added per entry next field + small corrections.

f513596

JPBergsma mentioned this pull request Mar 29, 2023

Adding per property meta_data field. #462

Closed

JPBergsma mentioned this pull request Apr 11, 2023

Add per entry and per property meta data field. #463

Merged

rartino reviewed Apr 27, 2023

View reviewed changes

JPBergsma and others added 2 commits May 2, 2023 12:22

Apply suggestions from code review

65b8ad1

Co-authored-by: Rickard Armiento <[email protected]>

Adjusted the description next fields for ranged properties.

94db38c

JPBergsma modified the milestone: v1.2 May 3, 2023

JPBergsma marked this pull request as ready for review May 3, 2023 17:34

JPBergsma commented May 24, 2023

View reviewed changes

JPBergsma added 6 commits May 25, 2023 14:53

Corrected range name for _exmpl_ranged_thermostat.

033ea11

Added that querying on properties in the range dictionary is optional.

3762d30

Specifically mention that support for queries directly on the values …

8332567

…of ranged properties is Optional.

Updated example to latest version metadata proposal and added more ex…

a87b301

…planation for property_ranges query parameter.

Improved explanation returned_range field.

7e9b4f4

Small corrections.

4b298c5

rartino mentioned this pull request Jun 11, 2023

The road to trajectories #469

Open

5 tasks

ml-evs mentioned this pull request Jun 22, 2023

OPTIMADE v1.2 release planning #429

Closed

JPBergsma mentioned this pull request Jun 29, 2023

Adding the property_ranges query parameter and associated metadata #481

Open

JPBergsma closed this Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ranged properties #452

Add ranged properties #452

JPBergsma commented Jan 9, 2023 •

edited

Loading

rartino left a comment

rartino Jan 11, 2023

JPBergsma Jan 12, 2023

rartino Jan 13, 2023

JPBergsma Feb 14, 2023

JPBergsma Feb 15, 2023

rartino Jan 13, 2023

JPBergsma Jan 18, 2023

rartino Jan 19, 2023 •

edited

Loading

JPBergsma Feb 16, 2023

rartino Feb 17, 2023 •

edited

Loading

JPBergsma Feb 17, 2023

rartino Feb 17, 2023

rartino commented Feb 21, 2023 •

edited

Loading

rartino left a comment

rartino Mar 9, 2023

JPBergsma Mar 13, 2023

rartino Mar 9, 2023

rartino Mar 9, 2023

JPBergsma Mar 13, 2023 •

edited

Loading

rartino Mar 9, 2023

JPBergsma Mar 13, 2023 •

edited

Loading

JPBergsma commented Mar 28, 2023

rartino left a comment

rartino Apr 27, 2023

JPBergsma May 2, 2023 •

edited

Loading

rartino Apr 27, 2023

JPBergsma commented May 3, 2023

JPBergsma May 24, 2023 •

edited

Loading

JPBergsma commented Aug 25, 2023

	Databases MUST return the :property:`values` and :property:`indexes` field belonging to properties listed and SHOULD use the ranges in this query parameter.
	When a client includes `property_ranges` in a request, the response MUST include the :property:`values` and :property:`indexes` field belonging to properties listed and SHOULD use the ranges in this query parameter.

		A list with an identifier for each dimension of the property. If two properties have the same range_id for a dimension it means that the values at an index along this dimension belong to each other.
		For example, if both the :property:`energy` and :property:`cartesian_site_positions` of a trajectory have the same :field:`range_ids` it indicates that the energy at an index x(in the dimension labeled by this range_ids) belongs to the cartesian_site_positions at the same index x.

	- :field:`next`: a `JSON API links object <http://jsonapi.org/format/1.0/#document-links>`__.
	- :field:`next`: String.


		In addition to these fields in the metadata, entries which support accessing data via the :query-param:`property\_ranges` query parameter SHOULD support per entry :field:`next` and :field:`more_data_availble` field to enable returning all the

		- :field:`next`: a `JSON API links object <http://jsonapi.org/format/1.0/#document-links>`__.

Add ranged properties #452

Add ranged properties #452

Conversation

JPBergsma commented Jan 9, 2023 • edited Loading

rartino left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rartino Jan 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rartino Feb 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rartino commented Feb 21, 2023 • edited Loading

rartino left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JPBergsma Mar 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JPBergsma Mar 13, 2023 • edited Loading

Choose a reason for hiding this comment

JPBergsma commented Mar 28, 2023

rartino left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JPBergsma May 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JPBergsma commented May 3, 2023

JPBergsma May 24, 2023 • edited Loading

Choose a reason for hiding this comment

JPBergsma commented Aug 25, 2023

JPBergsma commented Jan 9, 2023 •

edited

Loading

rartino Jan 19, 2023 •

edited

Loading

rartino Feb 17, 2023 •

edited

Loading

rartino commented Feb 21, 2023 •

edited

Loading

JPBergsma Mar 13, 2023 •

edited

Loading

JPBergsma Mar 13, 2023 •

edited

Loading

JPBergsma May 2, 2023 •

edited

Loading

JPBergsma May 24, 2023 •

edited

Loading