Skip to content

Commit

Permalink
Merge pull request #406 from Ensembl/jalvarez/docs_update
Browse files Browse the repository at this point in the history
Documentation, badges and version updates
  • Loading branch information
JAlvarezJarreta authored Jul 18, 2024
2 parents 3549b83 + ae513dc commit bc83a1b
Show file tree
Hide file tree
Showing 6 changed files with 59 additions and 84 deletions.
45 changes: 24 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,28 @@
# Ensembl GenomIO
[![GitHub license](https://img.shields.io/github/license/Ensembl/ensembl-genomio)](https://github.com/Ensembl/ensembl-genomio/blob/main/LICENSE)
[![Documentation deploy](https://github.com/Ensembl/ensembl-genomio/actions/workflows/mkdocs_docs_generation.yml/badge.svg)](https://ensembl.github.io/ensembl-genomio)
[![coverage report](https://vectorbase.gitdocs.ebi.ac.uk/ensembl-genomio/coverage-badge.svg)](https://vectorbase.gitdocs.ebi.ac.uk/ensembl-genomio/)

[![License](https://img.shields.io/badge/license-Apache_2.0-blue.svg)](https://github.com/Ensembl/ensembl-genomio/blob/main/LICENSE)
[![Coverage](https://vectorbase.gitdocs.ebi.ac.uk/ensembl-genomio/coverage-badge.svg)](https://vectorbase.gitdocs.ebi.ac.uk/ensembl-genomio/)
[![CI](https://gitlab.ebi.ac.uk/vectorbase/ensembl-genomio/badges/main/pipeline.svg)](https://gitlab.ebi.ac.uk/vectorbase/ensembl-genomio/-/pipelines)
[![Release](https://img.shields.io/pypi/v/ensembl-genomio)](https:pypi.org/project/ensembl-genomio)

Pipelines to turn basic genomic data into Ensembl cores and back.

This is a multilanguage (Perl, Python) repo providing eHive pipelines.
and various scripts (see below) to prepare genomic data and load it as [Ensembl core database](http://www.ensembl.org/info/docs/api/core/index.html) or to dump such core databases as file bundles.
This is a multilanguage (Perl, Python) repo providing eHive pipelines and various scripts (see below) to prepare genomic data and load it as [Ensembl core database](http://www.ensembl.org/info/docs/api/core/index.html) or to dump such core databases as file bundles.

Bundles themselves consist of genomic data in various formats (e.g. fasta, gff3, json) and should follow the corresponding [specification](docs/BRC4_genome_loader.md#input-data).


## Installation and configuration
This repository can be easily installed by running the following:

This repository is publicly available in [PyPI](https://pypi.org), so it can be easily installed with your preferred Python package manager, e.g.:

```bash
git clone https://github.com/Ensembl/ensembl-genomio.git
cd ensembl-genomio
pip install -e .
pip install ensembl-genomio
```

### Prerequisites
Pipelines are intended to be run inside the Ensembl production environment.
Please, make sure you have all the proper credential, keys, etc. set up.

Pipelines are intended to be run inside the Ensembl production environment. Please, make sure you have all the proper credential, keys, etc. set up.

### Get repo and install

Expand All @@ -34,9 +34,8 @@ git clone [email protected]:Ensembl/ensembl-genomio.git
Install the python part (of the pipelines) and test it:
```
pip install ./ensembl-genomio
# test
python -c 'import ensembl.brc4.runnable.read_json'
# And test it has been installed correctly
python -c 'import ensembl.io.genomio'
```

Update your perl envs (if you need to)
Expand All @@ -46,20 +45,21 @@ export PATH=$(pwd)/ensembl-genomio/scripts:$PATH
```

### Optional installation
If you need to install "editable" python package use '-e' option

If you need to install "editable" Python package use '-e' option
```
pip install -e ./ensembl-genomio
```

To install additional dependencies (e.g. `[docs]` or `[cicd]`) provide `[<tag>]` string. I.e.
To install additional dependencies (e.g. `[docs]` or `[cicd]`) provide `[<tag>]` string, e.g.:
```
pip install -e ./ensembl-genomio[cicd]
```

For the list of tags see `[project.optional-dependencies]` in [pyproject.toml](./pyproject.toml).


### Additional steps to use automated generation of the documentation

- Install python part with the `[docs]` tag
- Change into repo dir
- Run `mkdocs build` command
Expand All @@ -72,6 +72,7 @@ mkdocs build
```

### Nextflow installation

Please, refer to the "Installation" section of the [Nextflow pipelines document](docs/nextflow.md#installation).

## Pipelines
Expand Down Expand Up @@ -137,13 +138,15 @@ $LOOP_CMD 2> $OUT_DIR/loop.stderr 1> $OUT_DIR/loop.stdout
* [trf_split_run.bash](scripts/trf_split_run.bash) -- a trf wrapper with chunking support to be used with [ensembl-production-imported DNAFeatures pipeline](https://github.com/Ensembl/ensembl-production-imported/tree/main/src/perl/Bio/EnsEMBL/EGPipeline/PipeConfig/DNAFeatures_conf.pm) (see [docs](docs/trf_split_run.md))

## CI/CD bits
As for now some [Gitlab CI](https://docs.gitlab.com/ee/ci/) pipelines introduced to keep things in shape.
Though, this bit is in constant development. Some documentation can be found in [docs for GitLab CI/CD](docs/cicd_gitlab.md)

As for now some [Gitlab CI](https://docs.gitlab.com/ee/ci/) pipelines introduced to keep things in shape. Though, this bit is in constant development. Some documentation can be found in [docs for GitLab CI/CD](docs/cicd_gitlab.md)

## Various docs

See [docs](docs)

## Unit testing

The Python part of the codebase has now unit tests available to test each module. Make sure you have installed this repository's `[cicd]` dependencies (via `pip install ensembl-genomio[cicd]`) before continuing.

Running all the tests in one go is as easy as running `pytest` **from the root of the repository**. If you also want to measure, collect and report the code coverage, you can do:
Expand All @@ -158,7 +161,7 @@ pytest lib/python/tests/test_schema.py
```

## Acknowledgements
Some of this code and documentation is inherited from the [EnsemblGenomes](https://github.com/EnsemblGenomes) and other [Ensembl](https://github.com/Ensembl) projects.
We appreciate the effort and time spent by developers of the [EnsemblGenomes](https://github.com/EnsemblGenomes) and [Ensembl](https://github.com/Ensembl) projects.

Some of this code and documentation is inherited from the [EnsemblGenomes](https://github.com/EnsemblGenomes) and other [Ensembl](https://github.com/Ensembl) projects. We appreciate the effort and time spent by developers of the [EnsemblGenomes](https://github.com/EnsemblGenomes) and [Ensembl](https://github.com/Ensembl) projects.

Thank you!
17 changes: 8 additions & 9 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
# [Ensembl GenomIO](https://github.com/Ensembl/ensembl-genomio)

*Ensembl GenomIO Base Library Documentation*
# Ensembl GenomIO

A repository dedicated to pipelines used to turn basic genomic data into formatted
Ensembl core databases. Also allow users to dump core databases into various formats.

File formats handled : FastA, GFF3, JSON (*following BRC4 specifications*).

Contents
--------
## Contents

Check out [installation](install.md) section for further information on how
to install the project.

Expand All @@ -17,18 +15,19 @@ to install the project.
3. [Code of Conduct](code_of_conduct.md)
4. [Code reference](reference/)

Ehive pipelines
-------------------------------------------
## Ehive pipelines

Check out the [usage](usage.md) section for further information of requirements to
run ensembl-genomio pipelines.

1. __Genome loader__: Creates an Ensembl core database from a set of flat files.
2. __Genome dumper__: Dumps flat files from an Ensembl core database.

Nextflow pipelines
-------------------------------------------
## Nextflow pipelines

1. __Additional seq prepare__: BRC/Ensembl metazoa pipeline. Preparation of genome data loading files for new sequence(s) to existing species databases.
2. __Genome Prepare__: BRC/Ensembl metazoa pipeline. Retrieve data for genome(s), obtained from INSDC and RefSeq, validate and prepare GFF3, FASTA, JSON files for each genome accession.

## License

Software as part of [Ensembl GenomIO](https://github.com/Ensembl/ensembl-genomio) is distributed under the [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
61 changes: 19 additions & 42 deletions docs/install.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,15 @@
API Setup and installation
===========================

Requirements
--------------

An Ensembl API checkout including:

- [ensembl-genomio](https://github.com/Ensembl/ensembl-genomio) (export /src/perl into PERL5LIB)
- [ensembl-hive](https://github.com/Ensembl/ensembl-hive)
- [ensembl-production](https://github.com/Ensembl/ensembl-production)
- [ensembl-analysis](https://github.com/Ensembl/ensembl-analysis/tree/dev/hive_master) (on dev/hive_master branch)
- [ensembl-taxonomy](https://github.com/Ensembl/ensembl-taxonomy)
- [ensembl-orm](https://github.com/Ensembl/ensembl-orm)

Software
--------------

- Python 3.10+
- Perl 5.26
- Bioperl 1.6.9+

Python Modules
--------------
- bcbio-gff
- biopython
- jsonschema
- intervaltree
- mysql-connector-python
- python-redmine
- requests


## Installation
--------------
### Directly from GitHub:
# Installation

GenomIO is now publicly available in [PyPI](https://pypi.org), so you can easily install it with your preferred Python package manager, e.g.

```bash
pip install ensembl-genomio
```

## API setup and installation

To run also the pipelines present in the repository, you will need to have Perl 5.26 and Bioperl 1.6.9+ installed in your system, and then clone the GitHub repository together with some other repositories:

```
git clone https://github.com/Ensembl/ensembl-genomio
git clone https://github.com/Ensembl/ensembl-analysis -b dev/hive_master
Expand All @@ -43,11 +19,12 @@ git clone https://github.com/Ensembl/ensembl-taxonomy
git clone https://github.com/Ensembl/ensembl-orm
```


### Documentation

Documentation for Ensembl-genomio generated using _mkdocs_. For full information visit [mkdocs.org](https://www.mkdocs.org).
#### Commands
* `mkdocs new [dir-name]` - Create a new project.
* `mkdocs serve` - Start the live-reloading docs server.
* `mkdocs build` - Build the documentation site.
* `mkdocs -h` - Print help message and exit.

Useful commands:
* `mkdocs new [dir-name]` - Create a new project
* `mkdocs serve` - Start the live-reloading docs server
* `mkdocs build` - Build the documentation site
* `mkdocs -h` - Print help message and exit
10 changes: 4 additions & 6 deletions docs/nextflow.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Nextflow related documentation

## Installation
If you don't have an installed environment or you don't have nextflow itself, here's one of the ways to install it.
If you do not have an installed environment or you don't have nextflow itself, here is one of the ways to install it.

Define [`NXF_HOME` env variable](https://www.nextflow.io/docs/latest/config.html#environment-variables) to use a nextflow home location instead of the default one (`$HOME/.nextflow`).
Everything else is unchanged from the default Nextflow installation instructions on [https://www.nextflow.io/index.html#GetStarted](https://www.nextflow.io/index.html#GetStarted).
Expand All @@ -20,7 +20,7 @@ cat nextflow.install.bash | bash -i 2>&1 | tee nextflow.install.log
./nextflow run hello
```

Configure the environment you're using if you haven't done so yet.
Configure the environment you are using if you have not done so yet.
Don't forget to add `NXF_HOME`, patch `PATH` and export them.
```
# fix env variables, i.e.:
Expand Down Expand Up @@ -60,8 +60,7 @@ Try to invoke pipelines with `--help` option to get insight on how to run them.

### Channel is not forked, only one operation on stream is allowed
#### Symptoms:
When running a stage or a subworkflow on a channel with a single element
we expect stream to be forked, allowing us to seed several task at a time.
When running a stage or a subworkflow on a channel with a single element we expect stream to be forked, allowing us to seed several task at a time.
```
// create that channel with a single element
// calls read_json(...) in turn, see below
Expand All @@ -75,8 +74,7 @@ Instead pipeline dies with
```
Caused by: Cannot load from object array because "this.keys" is null
```
and when printing this object (`dbs` in this case, with `println "db: ${db}"`),
we see it dict surrounded by the curly brackets like this
and when printing this object (`dbs` in this case, with `println "db: ${db}"`), we see it dict surrounded by the curly brackets like this
```
{..., "db_name":"some_db_name", ...}
```
Expand Down
8 changes: 3 additions & 5 deletions docs/usage.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,11 @@
Usage
========
# Usage

For full details on python modules and Ensembl API repositories required see *install* section.

Environment setup
-----------------
## Environment setup

```
python3.7 -m venv path/to/virtual_env
python3.10 -m venv path/to/virtual_env
. path/to/virtual_env/bin/activate
pip install -e .
```
Expand Down
2 changes: 1 addition & 1 deletion src/python/ensembl/io/genomio/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@
# limitations under the License.
"""Genome Input/Output (GenomIO) handling library."""

__version__ = "1.1.1"
__version__ = "1.2.0"

0 comments on commit bc83a1b

Please sign in to comment.