-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for SMARTS filter language (filter_smarts
)
#398
base: develop
Are you sure you want to change the base?
Conversation
SMARTS isn't really my field, so even if this PR is simple and I see no formal issues with it, I think it should be reviewed and approved by people who want this feature in the standard and can see themselves implement it down the line. (I could add that I am a bit concerned over the lack of standardization of the SMARTS language discussed in #368 where it seems one may find different results depending on what SMARTS library the backend is using, so are these SMARTS queries really interoperable? But this isn't an issue I feel strongly about if there are people that want this feature.) |
I am concerned as well about the same, but SMARTS seems to be the state of the art for at least a decade now. I suggested SMARTS both as a constructive solution for chemical structure search (need expressed in #368), as well as an argument to my thesis that there is no simple way to do it. We may as well scrap the SMARTS proposal by leaving each implementation to sort out how they would like to do chemical substructure search, but my gut feeling is that most of them would still rely on SMARTS under-the-hood. |
Right; if these fields are not interoperable - i.e., database A's |
I am fine with not having OPTIMADE standardization for chemical (sub)structure search. I believe it was @JPBergsma who wanted such queries standardized. |
I think you are the only ones active in the OPTIMADE repo that I suspect may have some experience with SMARTS? This is a short PR, perhaps you can look it over and just confirm if what is written makes sense to you? If we cannot find two people who are invested in getting this merged, it seems highly unlikely that this will be implemented any time soon. If so, I think we should close this PR for now to de-clutter the active PR list, and re-open it if someone shows up and asks for this feature (probably via #368). |
Tagging @vaitkus who may as well be interested. We come from the same database, thus it would be best to hear from others as well. I am not particularly happy about closing prospective PRs though. Closed PRs come very close to being invisible - I guess no one visiting a repository looks at closed PRs. And open PRs give an impression of directions where a project is moving towards. Moreover, they invite for discussion. Thus I would vote for keeping all prospective PRs open. Maybe marking them with some tag (a la |
Sorry, I have no experience with SMARTS |
I think I originally started the whole SMILES/SMARTS discussion after talking with someone from the Ocelot database. They have a database with crystal structures of organic molecules. For them being able top query the topology of the molecule, e.g., the SMILES string, is essential to be able to select the entries. I have not worked with SMARTS queries myself. As I may have mentioned before, it is no longer possible to use one query for all databases with this PR, :query-url: would return all entries that contain a carbon atom that is attached to an aromatic atom, which in turn is bound to an aromatic atom bound to an oxygen atom. This would maintain the full querying flexibility we have now, and if we apply a prefix it should be backwards compatible with existing databases, who would ignore it like any other unknown field with a known prefix. |
Thanks for the clarification! So, are you (or someone else?) willing to seek out the Ocelot database and ask them to look at this PR? I mean, I understand it isn't exactly trivial to review a PR of a project you are not actively contributing to, but I think we should not merge a feature without some form of direct input from someone with experience of and/or stake in said feature.
One will never be able to do a SMARTS query on a database built on a backend with no support for it, so to allow this as an optional, but standardized, extension is only positive for interoperability. It is better that databases that support SMARTS do it in one unified way rather than inventing their own non-standard extensions. This is the design idea we've followed for many other optional features.
Maybe I'm missing something, but I don't see how this increases interoperability? What is a database that have no means of evaluating _filter_smarts going to do with this query? |
Example: | ||
|
||
- :query-url:`http://example.org/optimade/v1/structures?filter_smarts=CaaO` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the example indeed filter_smarts=CaaO
, or is it filter=smarts=CaaO
? And what is 'CaaO' maybe it should be 'CaO'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is filter_smarts
, and the example is supposed to be CaaO
, as taken from Daylight's SMARTS examples.
In 2024-11-04 meeting on SMILES and other cheminformatics notations for OPTIMADE, the need for SMARTS support in filters was universally emphasized. There was even a suggestion to introduce a SMARTS operator in the OPTIMADE filter language (or even allow namespace- and provider-specific operators?), this would supersede the need for
|
Or alternatively, something like a MATCH, LIKE or SEARCH operator that can have semantics defined for a custom type |
This is even better IMO, as it would not require extending the filter language syntax. |
In #368 support for SMARTS filter language was suggested. This PR:
In short, SMARTS queries are to be passed using
filter_smarts
URL parameter. If appearing together withfilter
in the same request, different filters are combined with logical AND.