-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add hci converter #23
Merged
Merged
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
4d968ba
refactor: rename synth converter command in just file
sabinem a5c9511
feat: add hci-converter to package
sabinem 834a7cd
feat: add hci-converter on just file
sabinem 8aa31e5
feat: add Cargo.toml for hci converter
sabinem 2087e65
feat: add campaign and campaign wrapper to catplus common types
sabinem 2dc8c11
feat: add code files to hci converter
sabinem 3a5650c
test: add test to hci wrapper
sabinem edb398f
feat: update namespaces for hci-converter
sabinem 15116ff
feat: update types
sabinem 69f83d7
test: add hci-converter tests
sabinem 6950c30
fix: avoid infinite loop when adding actions
sabinem ed78787
fix: update .gitignore to allow adding json examples
sabinem 5f39e80
format: cargo fmt
cmdoret 3d05018
chore: sort namespace alphabetically
sabinem da96d18
feat: implement into_graph for campaign_wrapper
sabinem 434f269
refactor: improve prefix map by using macros
sabinem 7136c91
refactor: unify hci and synth parser to a shared parser
sabinem 7807bdc
tests: adapt tests for unified synth and hci converter
sabinem 6d9c260
refactor: change name from synth-converter to converter
sabinem 1665e22
refactor: adapt cargo.toml to name change of converters
sabinem 1394fca
refactor: delete unused hci-converter
sabinem 0c42273
refactor: name change from synth-converter to converter
sabinem 23a1ba9
style: applying just fmt
sabinem 1fbe4a8
chore: change comment in justfile
sabinem 5f69d07
fix: add enum for serialization format
sabinem 91f5193
tests: fix tests after change of format to enum
sabinem 336c823
chore: update README
sabinem 0a3c81d
style: apply cargo fmt
sabinem File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,4 +11,3 @@ debug | |
#Folders | ||
data/** | ||
*.ttl | ||
*.json |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ resolver = "2" | |
|
||
members = [ | ||
"src/catplus-common", | ||
"src/synth-converter", | ||
"src/converter", | ||
] | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,51 +2,51 @@ | |
|
||
## About | ||
|
||
This repository contains all the Zarr converters for the different data types in the Cat+ project (Agilent, UV, IR, etc.) | ||
The data types are all in different formats, their data and metadata colluded together. The goal will be to convert the metadata to [an established ontology](https://github.com/sdsc-ordes/catplus-ontology/tree/main), and -as much as data format allow- convert the data in [Zarr array](https://zarr.readthedocs.io/en/stable/index.html). | ||
This repository contains all the converters for the different data types in the Cat+ project (Agilent, UV, IR, etc.) | ||
The data types are all in different formats, their data and metadata colluded together. The goal will be to convert the metadata to [an established ontology](https://github.com/sdsc-ordes/catplus-ontology/tree/main), and provide the data in their original files. | ||
|
||
## Tools | ||
|
||
### synth-converter | ||
The Synth-converter parses a json input into an rdf graph and serializes the graph to either turtle or jsonld. | ||
It expects the input to conform to the cat+ ontology and the struct `synth-converter/src/batch.rs`. An example input file is provided in `example/1-Synth.json`. | ||
### converter | ||
The converter parses a json input into an rdf graph and serializes the graph to either turtle or jsonld. | ||
It expects the input to conform to the cat+ ontology and the struct `src/catplus-common/src/models/types.rs`. Example input files are provided in `examples` directory. | ||
|
||
#### Usage | ||
|
||
The `synth-converter` has three parameters: | ||
The `converter` has four arguments: | ||
|
||
- input_type: currently `synth` (see `examples/1-Synth.json`) or `hci` (see `examples/0-HCI.json`) | ||
- inputfile: path to input file (relative to top level of the repo or absolute) | ||
- outputfile: path to output file (relative to top level of the repo or absolute) | ||
- format: default is "ttl", the other option is jsonld | ||
- format: rdf output format, currently `turtle` or `jsonld` | ||
|
||
The `synth-converter` turns the inputfile into a rdf graph and serilizes it to either turtle or jsonld. The serialization is written to an outputfile. | ||
The `converter` turns the inputfile into a rdf graph and serializes it to either turtle or jsonld. The serialization is written to the provided outputfile. | ||
|
||
Examples | ||
|
||
``` | ||
just run example/1-Synth.json output.ttl | ||
just run example/1-Synth.json output.json --format jsonld | ||
just run synth examples/1-Synth.json examples/1-Synth.ttl turtle | ||
just run hci examples/0-HCI.json examples/0-HCI.ttl jsonld | ||
``` | ||
|
||
### Architecture | ||
|
||
The json input is read with `serde_json`: the transformation of fields is described in the struct `synth-converter/src/batch.rs` | ||
|
||
The graph is build via `synth-converter/src/graph/graph_builder.rs` and uses `sophia_rs`. Besides `rdf` and `xsd` that have build in namespaces in `sophia_rs`, all namespaces and terms are provided in `synth-converter/src/graph/namespaces` as constants. This makes the code more readable and also ensures that the rdf iris and namespaces are controlled and spelt correctly. | ||
Graph serializers and parsers are provided in `synth-converter/src/rdf`. The turtle serializer there is needed for the test. | ||
The conversion is done in the public crate `synth-converter/src/convert.rs` | ||
The json input is read with `serde_json`: the transformation into rdf is done by the `src/catplus-common` library. | ||
It uses `sophia_rs`. The mapping is triggered by `src/catplus-common/src/models/types.rs` and makes use of the namespaces defined at `src/catplus-common/src/graph/namespaces`. | ||
|
||
### Shacl Validation | ||
|
||
The rdf graph confirms to the cat+ ontology: https://github.com/sdsc-ordes/catplus-ontology. Currently rust offeres no Shacl Validation Library, but once such a library exists, it would make sense to add a Shacl Validation. | ||
|
||
TheShacl Validation can be done manually here: https://www.itb.ec.europa.eu/shacl/any/upload | ||
The Shacl Validation can be done manually here: https://www.itb.ec.europa.eu/shacl/any/upload | ||
|
||
## Installation guidelines | ||
|
||
The repo is setup with nix. | ||
|
||
``` | ||
git clone [email protected]:sdsc-ordes/catplus-zarr-converters.git | ||
cd catplus-zarr-converters | ||
git clone [email protected]:sdsc-ordes/catplus-converters.git | ||
cd catplus-converters | ||
cargo build | ||
``` | ||
|
||
|
@@ -57,17 +57,18 @@ The rust commands can be started via a justfile: | |
``` | ||
just --list | ||
Available recipes: | ||
build *args # Build the synth-converter. | ||
default # Default recipe to list all recipes. | ||
nix-develop *args # Enter a Nix development shell. | ||
run input_file output_file *args # Run the synth-converter. | ||
test *args # Test the synth-converter. | ||
fmt *arg # Format the synth-converter. | ||
build *args # Build all crates | ||
default # Default recipe to list all recipes. | ||
format *args # Format all crates | ||
fmt *args # alias for `format` | ||
nix-develop *args # Enter a Nix development shell. | ||
run input_type input_file output_file *args # Run the converter. | ||
test *args # Test all crates | ||
``` | ||
|
||
### Tests | ||
|
||
Run the tests with `just test`: only integration tests have been integrated that ensure that the serialized graph in turtle is isomorphic to an expected turtle serialization per valid substructure of the input data: this substructures are action that occur in the synthesis process. | ||
Run the tests with `just test`: only integration tests have been integrated that ensure that the serialized graph in turtle is isomorphic to an expected turtle serialization of the input data. | ||
|
||
### Contribute | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
{ | ||
"hasCampaign": { | ||
"campaignName": "Caffeine Synthesis", | ||
"description": "1-step N-methylation of theobromine to caffeine", | ||
"objective": "High caffeine yield at the end", | ||
"campaignClass": "Standard Research", | ||
"type": "optimization", | ||
"reference": "Substitution reaction - SN2", | ||
"hasBatch": { | ||
"batchID": "23", | ||
"batchName": "20240516", | ||
"reactionType": "N-methylation", | ||
"reactionName": "Caffeine synthesis", | ||
"optimizationType": "Yield optimization", | ||
"link": "https://www.sciencedirect.com/science/article/pii/S0187893X15720926" | ||
}, | ||
"hasObjective": { | ||
"criteria": "Yield ≥ 90%", | ||
"condition": "Reflux in acetone with methyl iodide and potassium carbonate", | ||
"description": "Optimize reaction conditions to maximize caffeine yield from theobromine using methyl iodide", | ||
"objectiveName": "Maximize caffeine formation" | ||
}, | ||
"hasChemical": [ | ||
{ | ||
"chemicalID": "19", | ||
"chemicalName": "Sodium methoxide", | ||
"CASNumber": "124-41-4", | ||
"molecularMass": { | ||
"value": 54.024, | ||
"unit": "g/mol" | ||
}, | ||
"smiles": "C[O-].[Na+]", | ||
"swissCatNumber": "SwissCAT-10942334", | ||
"keywords": "optional only in HCI file", | ||
"Inchi": "InChI=1S/CH3O.Na/c1-2;/h1H3;/q-1;+1", | ||
"molecularFormula": "CH3NaO", | ||
"density": { | ||
"value": 1.3, | ||
"unit": "g/mL" | ||
} | ||
}, | ||
{ | ||
"chemicalID": "36", | ||
"chemicalName": "theobromine", | ||
"CASNumber": "83-67-0", | ||
"molecularMass": { | ||
"value": 180.160, | ||
"unit": "g/mol" | ||
}, | ||
"smiles": "CN1C=NC2=C1C(=O)NC(=O)N2C", | ||
"swissCatNumber": "SwissCAT-5429", | ||
"keywords": "optional only in HCI file", | ||
"Inchi": "InChI=1S/C7H8N4O2/c1-10-3-8-5-4(10)6(12)9-7(13)11(5)2/h3H,1-2H3,(H,9,12,13)", | ||
"molecularFormula": "C7H8N4O2", | ||
"density": { | ||
"value": 1.522, | ||
"unit": "g/mL" | ||
} | ||
}, | ||
{ | ||
"chemicalID": "25", | ||
"chemicalName": "methyl iodide", | ||
"CASNumber": "74-88-4", | ||
"molecularMass": { | ||
"value": 141.939, | ||
"unit": "g/mol" | ||
}, | ||
"smiles": "CI", | ||
"swissCatNumber": "SwissCAT-6328", | ||
"keywords": "optional only in HCI file", | ||
"Inchi": "InChI=1S/CH3I/c1-2/h1H3", | ||
"molecularFormula": "CH3I", | ||
"density": { | ||
"value": 2.28, | ||
"unit": "g/mL" | ||
} | ||
}, | ||
{ | ||
"chemicalID": "79", | ||
"chemicalName": "methanol", | ||
"CASNumber": "67-56-1", | ||
"molecularMass": { | ||
"value": 32.042, | ||
"unit": "g/mol" | ||
}, | ||
"smiles": "CO", | ||
"swissCatNumber": "SwissCAT-887", | ||
"keywords": "optional only in HCI file", | ||
"Inchi": "InChI=1S/CH4O/c1-2/h2H,1H3", | ||
"molecularFormula": "CH4O", | ||
"density": { | ||
"value": 0.79, | ||
"unit": "g/mL" | ||
} | ||
} | ||
] | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
use lazy_static::lazy_static; | ||
use sophia::api::ns::Namespace; | ||
use sophia_api::namespace; | ||
namespace! { | ||
"http://purl.allotrope.org/ontologies/common#", | ||
AFC_0000090 | ||
} | ||
lazy_static! { | ||
pub static ref ns: Namespace<&'static str> = Namespace::new(PREFIX.as_str()).unwrap(); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
use lazy_static::lazy_static; | ||
use sophia::api::ns::Namespace; | ||
use sophia_api::namespace; | ||
namespace! { | ||
"http://purl.allotrope.org/ontologies/hdf5/1.8#", | ||
HardLink | ||
} | ||
lazy_static! { | ||
pub static ref ns: Namespace<&'static str> = Namespace::new(PREFIX.as_str()).unwrap(); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
pub mod allocom; | ||
pub mod allohdf; | ||
pub mod alloproc; | ||
pub mod alloqual; | ||
pub mod allores; | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: the command is already executed from the root dir, I don't think we need to prefix individual arguments. Removing those prefixes also allows to specify external files via absolute path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmdoret I tried this but it did not work for me.
The current command is like this:
It works from everywhere in the projects directory tree and you first go down to
{{root_dir}}/src/converter
and then it searches from there if you don't add{{root_dir}}
again. So I leave it as it is now.