Merge pull request #406 from Ensembl/jalvarez/docs_update

Documentation, badges and version updates
Ensembl · Jul 18, 2024 · bc83a1b · bc83a1b
2 parents 3549b83 + ae513dc
commit bc83a1b
Show file tree

Hide file tree

Showing 6 changed files with 59 additions and 84 deletions.
diff --git a/README.md b/README.md
@@ -1,28 +1,28 @@
 # Ensembl GenomIO
-[![GitHub license](https://img.shields.io/github/license/Ensembl/ensembl-genomio)](https://github.com/Ensembl/ensembl-genomio/blob/main/LICENSE)
-[![Documentation deploy](https://github.com/Ensembl/ensembl-genomio/actions/workflows/mkdocs_docs_generation.yml/badge.svg)](https://ensembl.github.io/ensembl-genomio)
-[![coverage report](https://vectorbase.gitdocs.ebi.ac.uk/ensembl-genomio/coverage-badge.svg)](https://vectorbase.gitdocs.ebi.ac.uk/ensembl-genomio/)
+
+[![License](https://img.shields.io/badge/license-Apache_2.0-blue.svg)](https://github.com/Ensembl/ensembl-genomio/blob/main/LICENSE)
+[![Coverage](https://vectorbase.gitdocs.ebi.ac.uk/ensembl-genomio/coverage-badge.svg)](https://vectorbase.gitdocs.ebi.ac.uk/ensembl-genomio/)
+[![CI](https://gitlab.ebi.ac.uk/vectorbase/ensembl-genomio/badges/main/pipeline.svg)](https://gitlab.ebi.ac.uk/vectorbase/ensembl-genomio/-/pipelines)
+[![Release](https://img.shields.io/pypi/v/ensembl-genomio)](https:pypi.org/project/ensembl-genomio)
 
 Pipelines to turn basic genomic data into Ensembl cores and back.
 
-This is a multilanguage (Perl, Python) repo providing eHive pipelines.
-and various scripts (see below) to prepare genomic data and load it as [Ensembl core database](http://www.ensembl.org/info/docs/api/core/index.html) or to dump such core databases as file bundles.
+This is a multilanguage (Perl, Python) repo providing eHive pipelines and various scripts (see below) to prepare genomic data and load it as [Ensembl core database](http://www.ensembl.org/info/docs/api/core/index.html) or to dump such core databases as file bundles.
 
 Bundles themselves consist of genomic data in various formats (e.g. fasta, gff3, json) and should follow the corresponding [specification](docs/BRC4_genome_loader.md#input-data).
 
 
 ## Installation and configuration
-This repository can be easily installed by running the following:
+
+This repository is publicly available in [PyPI](https://pypi.org), so it can be easily installed with your preferred Python package manager, e.g.:
 
 ```bash
-git clone https://github.com/Ensembl/ensembl-genomio.git
-cd ensembl-genomio
-pip install -e .
+pip install ensembl-genomio
 ```
 
 ### Prerequisites
-Pipelines are intended to be run inside the Ensembl production environment.
-Please, make sure you have all the proper credential, keys, etc. set up.
+
+Pipelines are intended to be run inside the Ensembl production environment. Please, make sure you have all the proper credential, keys, etc. set up.
 
 ### Get repo and install
 
@@ -34,9 +34,8 @@ git clone [email protected]:Ensembl/ensembl-genomio.git
 Install the python part (of the pipelines) and test it:
 ```
 pip install ./ensembl-genomio
-
-# test
-python -c 'import ensembl.brc4.runnable.read_json'
+# And test it has been installed correctly
+python -c 'import ensembl.io.genomio'
 ```
 
 Update your perl envs (if you need to)
@@ -46,20 +45,21 @@ export PATH=$(pwd)/ensembl-genomio/scripts:$PATH
 ```
 
 ### Optional installation
-If you need to install "editable" python package use '-e' option
+
+If you need to install "editable" Python package use '-e' option
 ```
 pip install -e ./ensembl-genomio
 ```
 
-To install additional dependencies (e.g. `[docs]` or `[cicd]`) provide `[<tag>]` string. I.e.
+To install additional dependencies (e.g. `[docs]` or `[cicd]`) provide `[<tag>]` string, e.g.:
 ```
 pip install -e ./ensembl-genomio[cicd]
 ```
 
 For the list of tags see `[project.optional-dependencies]` in [pyproject.toml](./pyproject.toml). 
 
-
 ### Additional steps to use automated generation of the documentation
+
 - Install python part with the `[docs]` tag
 - Change into repo dir
 - Run `mkdocs build` command
@@ -72,6 +72,7 @@ mkdocs build
 ```
 
 ###  Nextflow installation
+
 Please, refer to the "Installation" section of the [Nextflow pipelines document](docs/nextflow.md#installation).
 
 ## Pipelines
@@ -137,13 +138,15 @@ $LOOP_CMD 2> $OUT_DIR/loop.stderr 1> $OUT_DIR/loop.stdout
 * [trf_split_run.bash](scripts/trf_split_run.bash) -- a trf wrapper with chunking support to be used with [ensembl-production-imported DNAFeatures pipeline](https://github.com/Ensembl/ensembl-production-imported/tree/main/src/perl/Bio/EnsEMBL/EGPipeline/PipeConfig/DNAFeatures_conf.pm) (see [docs](docs/trf_split_run.md))
 
 ## CI/CD bits
-As for now some [Gitlab CI](https://docs.gitlab.com/ee/ci/) pipelines introduced to keep things in shape.
-Though, this bit is in constant development. Some documentation can be found in [docs for GitLab CI/CD](docs/cicd_gitlab.md)
+
+As for now some [Gitlab CI](https://docs.gitlab.com/ee/ci/) pipelines introduced to keep things in shape. Though, this bit is in constant development. Some documentation can be found in [docs for GitLab CI/CD](docs/cicd_gitlab.md)
 
 ## Various docs
+
 See [docs](docs)
 
 ## Unit testing
+
 The Python part of the codebase has now unit tests available to test each module. Make sure you have installed this repository's `[cicd]` dependencies (via `pip install ensembl-genomio[cicd]`) before continuing.
 
 Running all the tests in one go is as easy as running `pytest` **from the root of the repository**. If you also want to measure, collect and report the code coverage, you can do:
@@ -158,7 +161,7 @@ pytest lib/python/tests/test_schema.py
 ```
 
 ## Acknowledgements
-Some of this code and documentation is inherited from the [EnsemblGenomes](https://github.com/EnsemblGenomes) and other [Ensembl](https://github.com/Ensembl) projects.
-We appreciate the effort and time spent by developers of the [EnsemblGenomes](https://github.com/EnsemblGenomes) and [Ensembl](https://github.com/Ensembl) projects. 
+
+Some of this code and documentation is inherited from the [EnsemblGenomes](https://github.com/EnsemblGenomes) and other [Ensembl](https://github.com/Ensembl) projects. We appreciate the effort and time spent by developers of the [EnsemblGenomes](https://github.com/EnsemblGenomes) and [Ensembl](https://github.com/Ensembl) projects. 
 
 Thank you!
diff --git a/docs/index.md b/docs/index.md
@@ -1,14 +1,12 @@
-# [Ensembl GenomIO](https://github.com/Ensembl/ensembl-genomio)
-
-*Ensembl GenomIO Base Library Documentation*
+# Ensembl GenomIO
 
 A repository dedicated to pipelines used to turn basic genomic data into formatted 
 Ensembl core databases. Also allow users to dump core databases into various formats.
 
 File formats handled : FastA, GFF3, JSON (*following BRC4 specifications*).
 
-Contents
---------
+## Contents
+
 Check out [installation](install.md) section for further information on how 
 to install the project.
 
@@ -17,18 +15,19 @@ to install the project.
 3. [Code of Conduct](code_of_conduct.md)
 4. [Code reference](reference/)
 
-Ehive pipelines
--------------------------------------------
+## Ehive pipelines
+
 Check out the [usage](usage.md) section for further information of requirements to
 run ensembl-genomio pipelines.
 
 1. __Genome loader__: Creates an Ensembl core database from a set of flat files.
 2. __Genome dumper__: Dumps flat files from an Ensembl core database.
 
-Nextflow pipelines
--------------------------------------------
+## Nextflow pipelines
+
 1. __Additional seq prepare__: BRC/Ensembl metazoa pipeline. Preparation of genome data loading files for new sequence(s) to existing species databases.  
 2. __Genome Prepare__: BRC/Ensembl metazoa pipeline. Retrieve data for genome(s), obtained from INSDC and RefSeq, validate and prepare GFF3, FASTA, JSON files for each genome accession.
 
 ## License
+
 Software as part of [Ensembl GenomIO](https://github.com/Ensembl/ensembl-genomio) is distributed under the [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
diff --git a/docs/install.md b/docs/install.md
@@ -1,39 +1,15 @@
-API Setup and installation
-===========================
-
-Requirements
---------------
-
-An Ensembl API checkout including:
-
-- [ensembl-genomio](https://github.com/Ensembl/ensembl-genomio)  (export /src/perl into PERL5LIB)
-- [ensembl-hive](https://github.com/Ensembl/ensembl-hive)
-- [ensembl-production](https://github.com/Ensembl/ensembl-production)
-- [ensembl-analysis](https://github.com/Ensembl/ensembl-analysis/tree/dev/hive_master) (on dev/hive_master branch)
-- [ensembl-taxonomy](https://github.com/Ensembl/ensembl-taxonomy)
-- [ensembl-orm](https://github.com/Ensembl/ensembl-orm)
-
-Software
---------------
-
-- Python 3.10+
-- Perl 5.26
-- Bioperl 1.6.9+
-
-Python Modules
---------------
-- bcbio-gff
-- biopython
-- jsonschema
-- intervaltree
-- mysql-connector-python
-- python-redmine
-- requests
-
-
-## Installation
---------------
-### Directly from GitHub:
+# Installation
+
+GenomIO is now publicly available in [PyPI](https://pypi.org), so you can easily install it with your preferred Python package manager, e.g.
+
+```bash
+pip install ensembl-genomio
+```
+
+## API setup and installation
+
+To run also the pipelines present in the repository, you will need to have Perl 5.26 and Bioperl 1.6.9+ installed in your system, and then clone the GitHub repository together with some other repositories:
+
 ```
 git clone https://github.com/Ensembl/ensembl-genomio
 git clone https://github.com/Ensembl/ensembl-analysis -b dev/hive_master
@@ -43,11 +19,12 @@ git clone https://github.com/Ensembl/ensembl-taxonomy
 git clone https://github.com/Ensembl/ensembl-orm
 ```
 
-
 ### Documentation
+
 Documentation for Ensembl-genomio generated using _mkdocs_. For full information visit [mkdocs.org](https://www.mkdocs.org).
-#### Commands
-* `mkdocs new [dir-name]` - Create a new project.
-* `mkdocs serve` - Start the live-reloading docs server.
-* `mkdocs build` - Build the documentation site.
-* `mkdocs -h` - Print help message and exit.
+
+Useful commands:
+* `mkdocs new [dir-name]` - Create a new project
+* `mkdocs serve` - Start the live-reloading docs server
+* `mkdocs build` - Build the documentation site
+* `mkdocs -h` - Print help message and exit
diff --git a/docs/nextflow.md b/docs/nextflow.md
@@ -1,7 +1,7 @@
 # Nextflow related documentation
 
 ## Installation
-If you don't have an installed environment or you don't have nextflow itself, here's one of the ways to install it.
+If you do not have an installed environment or you don't have nextflow itself, here is one of the ways to install it.
 
 Define [`NXF_HOME` env variable](https://www.nextflow.io/docs/latest/config.html#environment-variables) to use a nextflow home location instead of the default one (`$HOME/.nextflow`).
 Everything else is unchanged from the default Nextflow installation instructions on [https://www.nextflow.io/index.html#GetStarted](https://www.nextflow.io/index.html#GetStarted).
@@ -20,7 +20,7 @@ cat nextflow.install.bash | bash -i 2>&1 | tee nextflow.install.log
 ./nextflow run hello
 ```
 
-Configure the environment you're using if you haven't done so yet.
+Configure the environment you are using if you have not done so yet.
 Don't forget to add `NXF_HOME`, patch `PATH` and export them.
 ```
 # fix env variables, i.e.:
@@ -60,8 +60,7 @@ Try to invoke pipelines with `--help` option to get insight on how to run them.
 
 ### Channel is not forked, only one operation on stream is allowed
 #### Symptoms:
-When running a stage or a subworkflow on a channel with a single element
-we expect stream to be forked, allowing us to seed several task at a time.
+When running a stage or a subworkflow on a channel with a single element we expect stream to be forked, allowing us to seed several task at a time.
 ```
 // create that channel with a single element
 //   calls read_json(...) in turn, see below
@@ -75,8 +74,7 @@ Instead pipeline dies with
 ```
 Caused by: Cannot load from object array because "this.keys" is null
 ```
-and when printing this object (`dbs` in this case, with `println "db: ${db}"`),
-we see it dict surrounded by the curly brackets like this
+and when printing this object (`dbs` in this case, with `println "db: ${db}"`), we see it dict surrounded by the curly brackets like this
 ```
 {..., "db_name":"some_db_name", ...}
 ```

diff --git a/docs/usage.md b/docs/usage.md
@@ -1,13 +1,11 @@
-Usage
-========
+# Usage
 
 For full details on python modules and Ensembl API repositories required see *install* section.
 
-Environment setup
------------------
+## Environment setup
 
 ```
-python3.7 -m venv path/to/virtual_env
+python3.10 -m venv path/to/virtual_env
 . path/to/virtual_env/bin/activate
 pip install -e .
 ```

diff --git a/src/python/ensembl/io/genomio/__init__.py b/src/python/ensembl/io/genomio/__init__.py
@@ -14,4 +14,4 @@
 # limitations under the License.
 """Genome Input/Output (GenomIO) handling library."""
 
-__version__ = "1.1.1"
+__version__ = "1.2.0"