Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add hci converter #23

Merged
merged 28 commits into from
Feb 19, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
4d968ba
refactor: rename synth converter command in just file
sabinem Feb 13, 2025
a5c9511
feat: add hci-converter to package
sabinem Feb 13, 2025
834a7cd
feat: add hci-converter on just file
sabinem Feb 13, 2025
8aa31e5
feat: add Cargo.toml for hci converter
sabinem Feb 13, 2025
2087e65
feat: add campaign and campaign wrapper to catplus common types
sabinem Feb 13, 2025
2dc8c11
feat: add code files to hci converter
sabinem Feb 13, 2025
3a5650c
test: add test to hci wrapper
sabinem Feb 13, 2025
edb398f
feat: update namespaces for hci-converter
sabinem Feb 14, 2025
15116ff
feat: update types
sabinem Feb 14, 2025
69f83d7
test: add hci-converter tests
sabinem Feb 14, 2025
6950c30
fix: avoid infinite loop when adding actions
sabinem Feb 14, 2025
ed78787
fix: update .gitignore to allow adding json examples
sabinem Feb 14, 2025
5f39e80
format: cargo fmt
cmdoret Feb 14, 2025
3d05018
chore: sort namespace alphabetically
sabinem Feb 17, 2025
da96d18
feat: implement into_graph for campaign_wrapper
sabinem Feb 17, 2025
434f269
refactor: improve prefix map by using macros
sabinem Feb 18, 2025
7136c91
refactor: unify hci and synth parser to a shared parser
sabinem Feb 18, 2025
7807bdc
tests: adapt tests for unified synth and hci converter
sabinem Feb 18, 2025
6d9c260
refactor: change name from synth-converter to converter
sabinem Feb 18, 2025
1665e22
refactor: adapt cargo.toml to name change of converters
sabinem Feb 18, 2025
1394fca
refactor: delete unused hci-converter
sabinem Feb 18, 2025
0c42273
refactor: name change from synth-converter to converter
sabinem Feb 18, 2025
23a1ba9
style: applying just fmt
sabinem Feb 18, 2025
1fbe4a8
chore: change comment in justfile
sabinem Feb 18, 2025
5f69d07
fix: add enum for serialization format
sabinem Feb 19, 2025
91f5193
tests: fix tests after change of format to enum
sabinem Feb 19, 2025
336c823
chore: update README
sabinem Feb 19, 2025
0a3c81d
style: apply cargo fmt
sabinem Feb 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,3 @@ debug
#Folders
data/**
*.ttl
*.json
36 changes: 18 additions & 18 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ resolver = "2"

members = [
"src/catplus-common",
"src/synth-converter",
"src/converter",
]


Expand Down
51 changes: 26 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,51 +2,51 @@

## About

This repository contains all the Zarr converters for the different data types in the Cat+ project (Agilent, UV, IR, etc.)
The data types are all in different formats, their data and metadata colluded together. The goal will be to convert the metadata to [an established ontology](https://github.com/sdsc-ordes/catplus-ontology/tree/main), and -as much as data format allow- convert the data in [Zarr array](https://zarr.readthedocs.io/en/stable/index.html).
This repository contains all the converters for the different data types in the Cat+ project (Agilent, UV, IR, etc.)
The data types are all in different formats, their data and metadata colluded together. The goal will be to convert the metadata to [an established ontology](https://github.com/sdsc-ordes/catplus-ontology/tree/main), and provide the data in their original files.

## Tools

### synth-converter
The Synth-converter parses a json input into an rdf graph and serializes the graph to either turtle or jsonld.
It expects the input to conform to the cat+ ontology and the struct `synth-converter/src/batch.rs`. An example input file is provided in `example/1-Synth.json`.
### converter
The converter parses a json input into an rdf graph and serializes the graph to either turtle or jsonld.
It expects the input to conform to the cat+ ontology and the struct `src/catplus-common/src/models/types.rs`. Example input files are provided in `examples` directory.

#### Usage

The `synth-converter` has three parameters:
The `converter` has four arguments:

- input_type: currently `synth` (see `examples/1-Synth.json`) or `hci` (see `examples/0-HCI.json`)
- inputfile: path to input file (relative to top level of the repo or absolute)
- outputfile: path to output file (relative to top level of the repo or absolute)
- format: default is "ttl", the other option is jsonld
- format: rdf output format, currently `turtle` or `jsonld`

The `synth-converter` turns the inputfile into a rdf graph and serilizes it to either turtle or jsonld. The serialization is written to an outputfile.
The `converter` turns the inputfile into a rdf graph and serializes it to either turtle or jsonld. The serialization is written to the provided outputfile.

Examples

```
just run example/1-Synth.json output.ttl
just run example/1-Synth.json output.json --format jsonld
just run synth examples/1-Synth.json examples/1-Synth.ttl turtle
just run hci examples/0-HCI.json examples/0-HCI.ttl jsonld
```

### Architecture

The json input is read with `serde_json`: the transformation of fields is described in the struct `synth-converter/src/batch.rs`

The graph is build via `synth-converter/src/graph/graph_builder.rs` and uses `sophia_rs`. Besides `rdf` and `xsd` that have build in namespaces in `sophia_rs`, all namespaces and terms are provided in `synth-converter/src/graph/namespaces` as constants. This makes the code more readable and also ensures that the rdf iris and namespaces are controlled and spelt correctly.
Graph serializers and parsers are provided in `synth-converter/src/rdf`. The turtle serializer there is needed for the test.
The conversion is done in the public crate `synth-converter/src/convert.rs`
The json input is read with `serde_json`: the transformation into rdf is done by the `src/catplus-common` library.
It uses `sophia_rs`. The mapping is triggered by `src/catplus-common/src/models/types.rs` and makes use of the namespaces defined at `src/catplus-common/src/graph/namespaces`.

### Shacl Validation

The rdf graph confirms to the cat+ ontology: https://github.com/sdsc-ordes/catplus-ontology. Currently rust offeres no Shacl Validation Library, but once such a library exists, it would make sense to add a Shacl Validation.

TheShacl Validation can be done manually here: https://www.itb.ec.europa.eu/shacl/any/upload
The Shacl Validation can be done manually here: https://www.itb.ec.europa.eu/shacl/any/upload

## Installation guidelines

The repo is setup with nix.

```
git clone [email protected]:sdsc-ordes/catplus-zarr-converters.git
cd catplus-zarr-converters
git clone [email protected]:sdsc-ordes/catplus-converters.git
cd catplus-converters
cargo build
```

Expand All @@ -57,17 +57,18 @@ The rust commands can be started via a justfile:
```
just --list
Available recipes:
build *args # Build the synth-converter.
default # Default recipe to list all recipes.
nix-develop *args # Enter a Nix development shell.
run input_file output_file *args # Run the synth-converter.
test *args # Test the synth-converter.
fmt *arg # Format the synth-converter.
build *args # Build all crates
default # Default recipe to list all recipes.
format *args # Format all crates
fmt *args # alias for `format`
nix-develop *args # Enter a Nix development shell.
run input_type input_file output_file *args # Run the converter.
test *args # Test all crates
```

### Tests

Run the tests with `just test`: only integration tests have been integrated that ensure that the serialized graph in turtle is isomorphic to an expected turtle serialization per valid substructure of the input data: this substructures are action that occur in the synthesis process.
Run the tests with `just test`: only integration tests have been integrated that ensure that the serialized graph in turtle is isomorphic to an expected turtle serialization of the input data.

### Contribute

Expand Down
98 changes: 98 additions & 0 deletions examples/0-HCI.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
{
"hasCampaign": {
"campaignName": "Caffeine Synthesis",
"description": "1-step N-methylation of theobromine to caffeine",
"objective": "High caffeine yield at the end",
"campaignClass": "Standard Research",
"type": "optimization",
"reference": "Substitution reaction - SN2",
"hasBatch": {
"batchID": "23",
"batchName": "20240516",
"reactionType": "N-methylation",
"reactionName": "Caffeine synthesis",
"optimizationType": "Yield optimization",
"link": "https://www.sciencedirect.com/science/article/pii/S0187893X15720926"
},
"hasObjective": {
"criteria": "Yield ≥ 90%",
"condition": "Reflux in acetone with methyl iodide and potassium carbonate",
"description": "Optimize reaction conditions to maximize caffeine yield from theobromine using methyl iodide",
"objectiveName": "Maximize caffeine formation"
},
"hasChemical": [
{
"chemicalID": "19",
"chemicalName": "Sodium methoxide",
"CASNumber": "124-41-4",
"molecularMass": {
"value": 54.024,
"unit": "g/mol"
},
"smiles": "C[O-].[Na+]",
"swissCatNumber": "SwissCAT-10942334",
"keywords": "optional only in HCI file",
"Inchi": "InChI=1S/CH3O.Na/c1-2;/h1H3;/q-1;+1",
"molecularFormula": "CH3NaO",
"density": {
"value": 1.3,
"unit": "g/mL"
}
},
{
"chemicalID": "36",
"chemicalName": "theobromine",
"CASNumber": "83-67-0",
"molecularMass": {
"value": 180.160,
"unit": "g/mol"
},
"smiles": "CN1C=NC2=C1C(=O)NC(=O)N2C",
"swissCatNumber": "SwissCAT-5429",
"keywords": "optional only in HCI file",
"Inchi": "InChI=1S/C7H8N4O2/c1-10-3-8-5-4(10)6(12)9-7(13)11(5)2/h3H,1-2H3,(H,9,12,13)",
"molecularFormula": "C7H8N4O2",
"density": {
"value": 1.522,
"unit": "g/mL"
}
},
{
"chemicalID": "25",
"chemicalName": "methyl iodide",
"CASNumber": "74-88-4",
"molecularMass": {
"value": 141.939,
"unit": "g/mol"
},
"smiles": "CI",
"swissCatNumber": "SwissCAT-6328",
"keywords": "optional only in HCI file",
"Inchi": "InChI=1S/CH3I/c1-2/h1H3",
"molecularFormula": "CH3I",
"density": {
"value": 2.28,
"unit": "g/mL"
}
},
{
"chemicalID": "79",
"chemicalName": "methanol",
"CASNumber": "67-56-1",
"molecularMass": {
"value": 32.042,
"unit": "g/mol"
},
"smiles": "CO",
"swissCatNumber": "SwissCAT-887",
"keywords": "optional only in HCI file",
"Inchi": "InChI=1S/CH4O/c1-2/h2H,1H3",
"molecularFormula": "CH4O",
"density": {
"value": 0.79,
"unit": "g/mL"
}
}
]
}
}
9 changes: 5 additions & 4 deletions justfile
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
#!/usr/bin/env bash
set positional-arguments
set shell := ["bash", "-cue"]

Expand All @@ -20,10 +21,10 @@ alias fmt := format
format *args:
cargo fmt {{args}}

# Run the synth-converter.
run input_file output_file *args:
cd "{{root_dir}}/src/synth-converter" && \
cargo run --bin synth-converter "{{root_dir}}/{{input_file}}" "{{root_dir}}/{{output_file}}" {{args}}
# Run the converter.
run input_type input_file output_file *args:
cd "{{root_dir}}/src/converter" && \
cargo run --bin converter "{{input_type}}" "{{root_dir}}/{{input_file}}" "{{root_dir}}/{{output_file}}" {{args}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cargo run --bin converter "{{input_type}}" "{{root_dir}}/{{input_file}}" "{{root_dir}}/{{output_file}}" {{args}}
cargo run --bin converter "{{input_type}}" "{{input_file}}" "{{output_file}}" {{args}}

suggestion: the command is already executed from the root dir, I don't think we need to prefix individual arguments. Removing those prefixes also allows to specify external files via absolute path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmdoret I tried this but it did not work for me.

The current command is like this:

run input_type input_file output_file *args:
    cd "{{root_dir}}/src/converter" && \
    cargo run --bin converter "{{input_type}}" "{{root_dir}}/{{input_file}}" "{{root_dir}}/{{output_file}}" {{args}}

It works from everywhere in the projects directory tree and you first go down to {{root_dir}}/src/converter and then it searches from there if you don't add {{root_dir}} again. So I leave it as it is now.


# Enter a Nix development shell.
nix-develop *args:
Expand Down
10 changes: 10 additions & 0 deletions src/catplus-common/src/graph/namespaces/allocom.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
use lazy_static::lazy_static;
use sophia::api::ns::Namespace;
use sophia_api::namespace;
namespace! {
"http://purl.allotrope.org/ontologies/common#",
AFC_0000090
}
lazy_static! {
pub static ref ns: Namespace<&'static str> = Namespace::new(PREFIX.as_str()).unwrap();
}
10 changes: 10 additions & 0 deletions src/catplus-common/src/graph/namespaces/allohdf.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
use lazy_static::lazy_static;
use sophia::api::ns::Namespace;
use sophia_api::namespace;
namespace! {
"http://purl.allotrope.org/ontologies/hdf5/1.8#",
HardLink
}
lazy_static! {
pub static ref ns: Namespace<&'static str> = Namespace::new(PREFIX.as_str()).unwrap();
}
17 changes: 9 additions & 8 deletions src/catplus-common/src/graph/namespaces/allores.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,19 @@ use sophia::api::ns::Namespace;
use sophia_api::namespace;
namespace! {
"http://purl.allotrope.org/ontologies/result#",
AFR_0001606,
AFR_0001723,
AFR_0001952,
AFR_0002036,
AFR_0002240,
AFR_0002296,
AFR_0002295,
AFR_0002294,
AFR_0002295,
AFR_0002296,
AFR_0002423,
AFR_0002464,
AFR_0002764,
AFRE_0000001,
AFX_0000622,
AFR_0002423,
AFR_0001606,
AFR_0001723,
AFR_0001952,
AFR_0002036
AFX_0000622
}
lazy_static! {
pub static ref ns: Namespace<&'static str> = Namespace::new(PREFIX.as_str()).unwrap();
Expand Down
31 changes: 19 additions & 12 deletions src/catplus-common/src/graph/namespaces/cat.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,34 +6,41 @@ namespace! {
AddAction,
Batch,
Campaign,
ContainerPositionAndQuantity,
Experiment,
FiltrateAction,
Observation,
Sample,
SetPressureAction,
SetTemperatureAction,
SetVacuumAction,
ShakeAction,
speedTumbleStirrerShape,
campaignClass,
campaignType,
casNumber,
chemicalName,
containerBarcode,
containerID,
ContainerPositionAndQuantity,
criteria,
dispenseType,
errorMargin,
expectedDatum,
Experiment,
FiltrateAction,
genericObjective,
hasBatch,
hasCampaign,
hasChemical,
hasContainerPositionAndQuantity,
hasObjective,
hasSample,
hasChemical,
internalBarCode,
measuredQuantity,
Objective,
Observation,
optimizationType,
reactionSubType,
reactionType,
role,
setTemperatureAction,
Sample,
SetPressureAction,
SetTemperatureAction,
SetVacuumAction,
ShakeAction,
speedInRPM,
speedTumbleStirrerShape,
subEquipmentName,
swissCatNumber,
temperatureShakerShape,
Expand Down
2 changes: 2 additions & 0 deletions src/catplus-common/src/graph/namespaces/mod.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
pub mod allocom;
pub mod allohdf;
pub mod alloproc;
pub mod alloqual;
pub mod allores;
Expand Down
1 change: 1 addition & 0 deletions src/catplus-common/src/graph/namespaces/obo.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ use sophia_api::namespace;
namespace! {
"http://purl.obolibrary.org/obo/",
CHEBI_25367,
IAO_0000005,
PATO_0001019
}
lazy_static! {
Expand Down
Loading