Skip to content

Commit

Permalink
Merge pull request #331 from tilfischer/data_publishing
Browse files Browse the repository at this point in the history
Data publishing
  • Loading branch information
jliermann authored Mar 14, 2024
2 parents 6f28d8c + 663672f commit 1535ef1
Show file tree
Hide file tree
Showing 25 changed files with 123 additions and 66 deletions.
6 changes: 3 additions & 3 deletions docs/00_intro/00_intro.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,8 @@ To enable fully digital workflows in chemistry, the development and provision of

<IntroButton url={"/docs/smartlab"} imgUrl={"/img/nfdi4chem_SmartLab.svg"} text={"Smartlab"} />

### Data Publication
### Data Publishing

In the category of data publication you will find all the important information on the topic of data publication. This includes the motivation to publish research data, paths to publish data, recommendations for research data repositories to be used, best practices and aspects of machine actionability.
In this category on data publishing you will find all the important information on the topic of data publishing. This includes the motivation to publish research data, paths to publish data, recommendations for research data repositories to be used, best practices and aspects of machine actionability.

<IntroButton url={"/docs/data_publication"} imgUrl={"/img/nfdi4chem_Data_Publication.svg"} text={"Data Publication"} />
<IntroButton url={"/docs/data_publishing"} imgUrl={"/img/nfdi4chem_Data_Publication.svg"} text={"Data Publishing"} />
2 changes: 1 addition & 1 deletion docs/00_intro/10_fair.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ In simple terms: metadata include any relevant history. If the dataset is relate

#### R1.3. (meta)data meet domain-relevant community standards

As research data management and, as such, [data publication](/docs/data_publication) becomes more and more prevalent across research areas, [best practices](/docs/best_practice) in the individual communities will arise. This should encompass metadata templates for proper documentation of datasets, how the data should be [organized](/docs/data_organisation), which vocabularies or [ontologies](/docs/ontology) to use, and [file formats](/docs/format_standards). NFDI4Chem is working to establish [metadata and data standards](https://www.nfdi4chem.de/index.php/task-areas/) for the various communities in chemistry.
As research data management and, as such, [data publishing](/docs/data_publishing) becomes more and more prevalent across research areas, [best practices](/docs/best_practice) in the individual communities will arise. This should encompass metadata templates for proper documentation of datasets, how the data should be [organized](/docs/data_organisation), which vocabularies or [ontologies](/docs/ontology) to use, and [file formats](/docs/format_standards). NFDI4Chem is working to establish [metadata and data standards](https://www.nfdi4chem.de/index.php/task-areas/) for the various communities in chemistry.

Where available, community standards and best practices should be followed when those publishing prepare their datasets and relevant metadata for publication. [Repositories](/docs/repositories), especially domain-specific service providers, should adhere to the standards set forth by the community by requiring files and metadata to follow format specifications.
As noted in [I1](#i1-metadata-use-a-formal-accessible-shared-and-broadly-applicable-language-for-knowledge-representation) above, the CIF format represents a community-specific standard associated with the chemical community. Furthermore, [NMReDATA](https://doi.org/10.1002/mrc.4737) represents a possible [standard](/docs/format_standards) for publishing and archiving (meta)data of Nuclear Magnetic Resonance (NMR) experiments.
Expand Down
4 changes: 2 additions & 2 deletions docs/00_intro/20_data_life_cycle.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ Before sharing data, you should check whether the data is subject to **copyright
During this exchange and the associated reflections on the data, you should think about archiving and using the data in scientific publications. If you are not aware of any **criteria for archiving** and no criteria are specified in your working group or institute, decision-making guides such as the [“5 steps to decide what data to keep”](https://www.dcc.ac.uk/guidance/how-guides/five-steps-decide-what-data-keep) outlined by the DCC can help. Based on the established criteria, it is determined which of the collected raw data should be archived and which should be deliberately deleted.

In addition to the criteria, the migration of the data into **suitable [formats](/docs/format_standards) and onto suitable media** is important for archiving the data. In this step, the data should again be enriched with metadata so that it can be understood in the future without further knowledge about the data.
In addition to archiving, the [publication](/docs/data_publication) of the data plays a special role. Many research funders expect the data to be published if there are no special reasons not to do so, such as a non-disclosure agreement or the inclusion of personal data. A **chemistry-specific or chemistry-related [repository](/docs/repositories)** such as the [Chemotion Repository](https://www.chemotion-repository.net/), [NOMAD](https://nomad-lab.eu/services/repo-arch), or [MassBank](https://massbank.eu/MassBank/) is recommended for the publication of data. An overview of repositories can be found, for example, at [re3data.org](https://www.re3data.org/) or [fairsharing.org](https://fairsharing.org/). re3data.org allows you to filter repositories according to certain criteria such as the assignment of a persistent identifier or access.
Data publication often takes place at certain milestones, for example, in combination with a text publication or at the end of a project. The **final version of the data management plan** is also required at the end of a project.
In addition to archiving, the [publication](/docs/data_publishing) of the data plays a special role. Many research funders expect the data to be published if there are no special reasons not to do so, such as a non-disclosure agreement or the inclusion of personal data. A **chemistry-specific or chemistry-related [repository](/docs/repositories)** such as the [Chemotion Repository](https://www.chemotion-repository.net/), [NOMAD](https://nomad-lab.eu/services/repo-arch), or [MassBank](https://massbank.eu/MassBank/) is recommended for the publication of data. An overview of repositories can be found, for example, at [re3data.org](https://www.re3data.org/) or [fairsharing.org](https://fairsharing.org/). re3data.org allows you to filter repositories according to certain criteria such as the assignment of a persistent identifier or access.
Data publishing often takes place at certain milestones, for example, in combination with a text publication or at the end of a project. The **final version of the data management plan** is also required at the end of a project.

## Phase 6: Re-use

Expand Down
8 changes: 4 additions & 4 deletions docs/10_domains/10_analytical_chemistry.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -79,9 +79,9 @@ A typical workflow begins with the conceptualisation of the research question, a
- A detailed view, evaluation and interpretation of results is carried out with the Chemotion ELN features.


## Publication of research data
## Publishing research data

- In addition to a research article in a scientific journal, the underlying research data are [published](/docs/data_publication) in a [repository](/docs/repositories) and linked to the article to realise research data management according to the [FAIR data principles](/docs/fair) ([Best practice examples](/docs/best_practice)).
- Data publication in a repository includes raw and processed data for reuse.
- In addition to a research article in a scientific journal, the underlying research data are [published](/docs/data_publishing) in a [repository](/docs/repositories) and linked to the article to realise research data management according to the [FAIR data principles](/docs/fair) ([Best practice examples](/docs/best_practice)).
- Data publications in repositories include raw and processed data for reuse.
- The use of the [Chemotion ELN](https://www.chemotion.net/chemotionsaurus/index.html) enables a direct transfer of research data and the respective metadata to the [Chemotion Repository](https://www.chemotion-repository.net/welcome). Subsequently, these data are automatically shared with other repositories, e.g. [PubChem](https://pubchem.ncbi.nlm.nih.gov/). For the publication of research data in other discipline-specific repositories, such as the [MassBank](https://massbank.eu/MassBank/) for reference mass spectra, data have to be exported from the Chemotion ELN and submitted to the respective database.
- A [persistent identifier](/docs/pid) (e.g., DOI) is generated for a dataset by a repository (e.g., [DataCite](https://datacite.org/) for the Chemotion Repository), which is given in the journal publication or corresponding supporting information to link the data publication with the manuscript.
- A [persistent identifier](/docs/pid) (e.g., DOI) is generated for a dataset by a repository (e.g., [DataCite](https://datacite.org/) for the Chemotion Repository), which is given in the journal article or corresponding supporting information to link the data publication with the manuscript.
6 changes: 3 additions & 3 deletions docs/10_domains/20_physical_chemistry.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Physical chemistry is an interdisciplinary science at the frontier between chemi
- obtained unprocessed raw files from measurements are uploaded to [ELN](/docs/eln) in open file formats and attached directly to the respective [ELN](/docs/eln) experiment entry, including metadata with data on the instrument (e.g. manufacturer, type, etc.), measurement conditions & parameters
- [metadata](/docs/metadata) related to the obtained data, such as temperature or solvent of measurement, follow common [metadata standards](/docs/metadata)
- research data are processed, analysed and compared with open non-proprietary software tools
- simultaneously with [publication](/docs/data_publication) as a research article in a scientific journal, the underlying research data is published in an open data [repository](/docs/repositories) and linked to the article (incl. semantically richly annotated raw and processed data in open data formats for reuse)
- simultaneously with [publication](/docs/data_publishing) as a research article in a scientific journal, the underlying research data is published in an open data [repository](/docs/repositories) and linked to the article (incl. semantically richly annotated raw and processed data in open data formats for reuse)
- an unique [persistent identifier](/docs/pid) (e.g. DOI) is generated for each dataset as well as for the journal publication

## Quantum Mechanical (QM) calculations
Expand All @@ -62,7 +62,7 @@ Physical chemistry is an interdisciplinary science at the frontier between chemi
- reproducibility of calculations to within numerical accuracy can be ensured by storing the input files and adding the program and its version (ideally even the compiler version and any compiler flags) as metadata. Numerical thresholds are well defined but reproducibility of calculations across different programs and versions is not guaranteed. This warrants the safekeeping of version specific source files for the same time period as the stored data
- data analysis scripts should be uploaded to the repository in open file formats, attached directly to the corresponding data entry and accompanied with appropriate documentation
- if possible, analysis and evaluation of calculations should be conducted with open, non-proprietary software tools
- simultaneously with [publication](/docs/data_publication) as a research article in a scientific journal, the data in the [repository](/docs/repositories) is linked to the article (incl. semantically richly annotated raw and processed data, if possible in open data formats for reuse)
- simultaneously with [publication](/docs/data_publishing) as a research article in a scientific journal, the data in the [repository](/docs/repositories) is linked to the article (incl. semantically richly annotated raw and processed data, if possible in open data formats for reuse)
- a unique [persistent identifier](/docs/pid) (e.g. DOI) is generated for the dataset as well as for the journal publication
- XML and CML (Chemical Markup Language) is used by a few software packages but this is not common practice

Expand All @@ -89,7 +89,7 @@ Physical chemistry is an interdisciplinary science at the frontier between chemi
- [documentation of all research data](/docs/data_documentation) and [metadata](/docs/metadata) is carried out digitally using a suitable repository to store the data
- reproducibility of calculations can be ensured by storing the input file and adding the program and its version (ideally including the compiler and any compiler flags) as metadata
- if possible, analysis and evaluation of calculations should be conducted with open non-proprietary software tools
- simultaneously with [publication](/docs/data_publication) as a research article in a scientific journal, the data in the [repository](/docs/repositories) is linked to the article (incl. semantically richly annotated raw and processed data, if possible in open data formats for reuse)
- simultaneously with [publication](/docs/data_publishing) as a research article in a scientific journal, the data in the [repository](/docs/repositories) is linked to the article (incl. semantically richly annotated raw and processed data, if possible in open data formats for reuse)
- a unique [persistent identifier](/docs/pid) (e.g. DOI) is generated for each dataset as well as for the journal publication

### Challenges to make data FAIR
Expand Down
6 changes: 3 additions & 3 deletions docs/10_domains/40_synthetic_chemistry.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,9 @@ The main goal of a synthetic organic or inorganic chemist is to synthesise desir
- Optionally, preprocessing of digital data with software of analytical device before data are transferred to the Chemotion ELN (cf. data producing methods).
- A detailed view, evaluation and interpretation of results is carried out with the Chemotion ELN features.

## Publication of research data
## Publishing research data

- In addition to a research article in a scientific journal, the underlying research data are published in a repository and linked to the article to realise research data management according to the [FAIR data principles](/docs/fair) ([Best practice examples](/docs/best_practice)).
- Data publication in a repository includes raw and processed data for reuse.
- Data publications in repositories include raw and processed data for reuse.
- The use of the Chemotion ELN enables a direct transfer of research data and the respective metadata into the Chemotion Repository. Subsequently, these data are automatically shared with other repositories, e.g. [PubChem](https://pubchem.ncbi.nlm.nih.gov/). For the publication of research data in other discipline-specific repositories, such as the [CCDC](https://www.ccdc.cam.ac.uk/) for crystallographic data, data have to be exported from the Chemotion ELN and uploaded into the respective database.
- A [persistent identifier](/docs/pid) (e.g. DOI) is generated for a dataset by a repository (via [DataCite](https://datacite.org/) for the Chemotion Repository), which is given in the journal publication or corresponding supporting information to link the data publication with the manuscript.
- A [persistent identifier](/docs/pid) (e.g. DOI) is generated for a dataset by a repository (via [DataCite](https://datacite.org/) for the Chemotion Repository), which is given in the journal article or corresponding supporting information to link the data publication with the manuscript.
4 changes: 2 additions & 2 deletions docs/20_role/10_research_group_leader.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@ This article applies to research group leaders, who plan to organise the RDM of

<img id="datalife" alt="Data LifeCycle" src={useBaseUrl('/img/Intro/DataLifeCycle_KB.svg')} width="500" align="right" vspace="10" hspace="10" />

As research group leader, you are responsible for the [research data organisation](/docs/data_organisation) of your group. Many research institutions and also most funding institutions require or give internal RDM guidelines (e.g. [DFG checklist](https://www.dfg.de/download/pdf/foerderung/grundlagen_dfg_foerderung/forschungsdaten/forschungsdaten_checkliste_de.pdf), BMBF, EU guidelines) and recommend the set-up of [data management plans](/docs/dmp) in order to ensure that the data are archived in a [FAIR](/docs/fair) (**F**indable, **A**ccessible, **I**nteroperable, **R**e-usable) manner. Many funding institutions encourage or even enforce the [publication](/docs/data_publication) of FAIR data.
As research group leader, you are responsible for the [research data organisation](/docs/data_organisation) of your group. Many research institutions and also most funding institutions require or give internal RDM guidelines (e.g. [DFG checklist](https://www.dfg.de/download/pdf/foerderung/grundlagen_dfg_foerderung/forschungsdaten/forschungsdaten_checkliste_de.pdf), BMBF, EU guidelines) and recommend the set-up of [data management plans](/docs/dmp) in order to ensure that the data are archived in a [FAIR](/docs/fair) (**F**indable, **A**ccessible, **I**nteroperable, **R**e-usable) manner. Many funding institutions encourage or even enforce the [publication](/docs/data_publishing) of FAIR data.

In recent years, many new digital tools have been developed to support researchers in their RDM needs. The technical possibilities are briefly outlined below. For more details, please refer to the related chapters, many of which are directly linked.

:::danger Consider
Digitisation of research data only **after** the end of the production process is most tedious and time-consuming.
:::

Therefore, it is more efficient to capture the data and their corresponding [metadata](/docs/metadata) as early as from the planning of the experiment. Here, [electronic lab notebooks (ELN)](/docs/eln) facilitate everyday work considerably: the planning of the experiment, the documentation of experimental procedures, the analysis of the obtained spectroscopic data as well as the peak assignment can all be completed in one digital environment. And even better: complete experiment reports with analytical data (e.g. for the supporting information for [publications](/docs/data_publication)) can be generated automatically by the ELN.
Therefore, it is more efficient to capture the data and their corresponding [metadata](/docs/metadata) as early as from the planning of the experiment. Here, [electronic lab notebooks (ELN)](/docs/eln) facilitate everyday work considerably: the planning of the experiment, the documentation of experimental procedures, the analysis of the obtained spectroscopic data as well as the peak assignment can all be completed in one digital environment. And even better: complete experiment reports with analytical data (e.g. for the supporting information for [publications](/docs/data_publishing)) can be generated automatically by the ELN.

The time invested to set up the ELN and to organise the experiments thus pays off in numerous ways: In addition to facilitating [documentation](/docs/data_documentation), the storage of the produced data in a FAIR format in a [repository](/docs/repositories) is simplified. Also, the [data organisation](/docs/data_organisation) for the research group can be improved by setting up an internal database. This is invaluable for growing working groups.

Expand Down
Loading

0 comments on commit 1535ef1

Please sign in to comment.