Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ensembl-utils and update documentation generation #380

Merged
merged 10 commits into from
Jun 4, 2024
50 changes: 17 additions & 33 deletions .github/workflows/mkdocs_docs_generation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,50 +25,34 @@ on:
- 'src/python/ensembl/**'
- 'docs/**'
- '.github/workflows/mkdocs_docs_generation.yml'
permissions:
contents: write

workflow_dispatch:

env:
PYTHON_VERSION: "3.9"

jobs:
Mkdocs_Doc_generation:
name: Deploy documentation to GitHub Pages
if: github.event.pull_request.merged == true
runs-on: ubuntu-latest
permissions:
contents: write

steps:
- name: Checkout Repository
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
- name: Set up Python ${{ env.PYTHON_VERSION }}
uses: actions/setup-python@v5
with:
python-version: 3.8
allow-prereleases: false
cache: 'pip'
cache-dependency-path: '**/pip'
run: echo '${{ steps.cp38.outputs.cache-hit }}'
python-version: ${{ env.PYTHON_VERSION }}

- name: Set pip cache directory path
id: pip-cache-dir-path
- name: Install dependencies
run: |
echo "PIPCACHE=(`pip cache dir`)" >> "$GITHUB_OUTPUT"

- name: Get pip cache dir
env:
PIPCACHE: ${{ steps.pip-cache-dir-path.outputs.PIPCACHE }}
run: echo "The pip cache dir located is $PIPCACHE"
python -m pip install --upgrade pip
pip install -e .[docs]

- name: Install Dependencies
- name: Run mkdocs
JAlvarezJarreta marked this conversation as resolved.
Show resolved Hide resolved
run: |
pip install -e .[doc]

- name: mkdocs deploy
run: |
mkdocs build

- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
if: ${{ github.event_name == 'pull_request' && github.ref == 'refs/heads/main' }}
with:
publish_branch: mkdocs
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./site
force_orphan: true
mkdocs gh-deploy --force

41 changes: 19 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,13 @@ Bundles themselves consist of genomic data in various formats (e.g. fasta, gff3,


## Installation and configuration
This repo
This repository can be easily installed by running the following:

```bash
git clone https://github.com/Ensembl/ensembl-genomio.git
cd ensembl-genomio
pip install -e .
```

### Prerequisites
Pipelines are intended to be run inside the Ensembl production environment.
Expand Down Expand Up @@ -45,30 +51,27 @@ If you need to install "editable" python package use '-e' option
pip install -e ./ensembl-genomio
```

To install additional dependencies (e.g. `[doc]` or `[cicd]`) provide `[<tag>]` string. I.e.
To install additional dependencies (e.g. `[docs]` or `[cicd]`) provide `[<tag>]` string. I.e.
```
pip install -e ./ensembl-genomio[doc]
pip install -e ./ensembl-genomio[cicd]
```

For the list of tags see `[project.optional-dependencies]` in [pyproject.toml](./pyproject.toml).


### Additional steps to use automated generation of the documentation (part of it)
Install python part with the `[doc]` tag.
Change into repo dir
Run doc build script.
### Additional steps to use automated generation of the documentation
- Install python part with the `[docs]` tag
- Change into repo dir
- Run `mkdocs build` command

```
git clone [email protected]:Ensembl/ensembl-genomio.git
pip install -e ./ensembl-genomio[doc]

git clone [email protected]:Ensembl/ensembl-genomio.git
cd ./ensembl-genomio

# build docs
./scripts/setup/docs/build_sphinx_docs.sh
pip install -e .[docs]
mkdocs build
```

### Nexflow installation
### Nextflow installation
Please, refer to the "Installation" section of the [Nextflow pipelines document](docs/nextflow.md#installation).

## Pipelines
Expand Down Expand Up @@ -131,11 +134,11 @@ $LOOP_CMD 2> $OUT_DIR/loop.stderr 1> $OUT_DIR/loop.stdout

### Scripts

* [trf_split_run.bash](scripts/trf_split_run.bash) -- a trf wrapper with chunking support to be used with [ensembl-production-imported DNAFeatures pipeline](https://github.com/Ensembl/ensembl-production-imported/tree/main/src/perl/Bio/EnsEMBL/EGPipeline/PipeConfig/DNAFeatures_conf.pm) (see [doc](docs/trf_split_run.md))
* [trf_split_run.bash](scripts/trf_split_run.bash) -- a trf wrapper with chunking support to be used with [ensembl-production-imported DNAFeatures pipeline](https://github.com/Ensembl/ensembl-production-imported/tree/main/src/perl/Bio/EnsEMBL/EGPipeline/PipeConfig/DNAFeatures_conf.pm) (see [docs](docs/trf_split_run.md))

## CI/CD bits
As for now some [Gitlab CI](https://docs.gitlab.com/ee/ci/) pipelines introduced to keep things in shape.
Though, this bit is in constant development. Some documentatin can be found in [docs for GitLab CI/CD](docs/cicd_gitlab.md)
Though, this bit is in constant development. Some documentation can be found in [docs for GitLab CI/CD](docs/cicd_gitlab.md)

## Various docs
See [docs](docs)
Expand All @@ -159,9 +162,3 @@ Some of this code and documentation is inherited from the [EnsemblGenomes](https
We appreciate the effort and time spent by developers of the [EnsemblGenomes](https://github.com/EnsemblGenomes) and [Ensembl](https://github.com/Ensembl) projects.

Thank you!






9 changes: 6 additions & 3 deletions cicd/gitlab/dot.gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,14 @@ workflow:
$CI_EXTERNAL_PULL_REQUEST_SOURCE_REPOSITORY == $CI_EXTERNAL_PULL_REQUEST_TARGET_REPOSITORY &&
( $CI_EXTERNAL_PULL_REQUEST_TARGET_BRANCH_NAME == $CI_DEFAULT_BRANCH ||
$CI_EXTERNAL_PULL_REQUEST_TARGET_BRANCH_NAME =~ /hackathon\/.+/ )
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
- if: $CI_COMMIT_BRANCH =~ /hackathon\/.+/
when: always
- if: $CI_PIPELINE_SOURCE == "push" &&
( $CI_COMMIT_BRANCH != $CI_DEFAULT_BRANCH || $CI_COMMIT_BRANCH !~ /hackathon\/.+/ )
when: never
- when: always

variables:
PYTHON_IMAGE: python:3.8
PYTHON_IMAGE: python:3.9
RUN_DIR: ./cicd/runtime

default:
Expand Down
2 changes: 1 addition & 1 deletion docs/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ An Ensembl API checkout including:
Software
--------------

- Python 3.8+
- Python 3.9+
- Perl 5.26
- Bioperl 1.6.9+

Expand Down
File renamed without changes.
10 changes: 5 additions & 5 deletions docs/trf_split_run.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# [trf_split_run.bash](scripts/trf_split_run.bash)

A trf wrapper with chunking support to be used with
[ensembl-production-imported DNAFeatures pipeline](https://github.com/Ensembl/ensembl-production-imported/tree/main/src/perl/Bio/EnsEMBL/EGPipeline/PipeConfig/DNAFeatures_conf.pm) (see [doc](docs/trf_split_run.md))
[ensembl-production-imported DNAFeatures pipeline](https://github.com/Ensembl/ensembl-production-imported/tree/main/src/perl/Bio/EnsEMBL/EGPipeline/PipeConfig/DNAFeatures_conf.pm) (see [docs](docs/trf_split_run.md))
Compatible compatible with input/output format of trf invocation from [Bio::EnsEMBL::Analysis::Runnable::TRF](https://github.com/Ensembl/ensembl-analysis/blob/main/modules/Bio/EnsEMBL/Analysis/Runnable/TRF.pm).
And can be used as a hack to allow TRF stage to be accomplished at the cost of splitting
long repeat into several adjacent ones (with possible losses).

## Prerequisites
You should have [Biopython](https://biopython.org) installed and available in your environmnent.
You should have [Biopython](https://biopython.org) installed and available in your environment.
You may check this with
```
python -c 'from Bio import SeqIO' || echo "no biopython" >> /dev/stderr
Expand All @@ -19,8 +19,8 @@ Use environment variable to control scipt run
* `DNA_FEATURES_TRF_SPLIT_NO_TRF` -- set to `YES` to skip trf stage
* `DNA_FEATURES_TRF_SPLIT_SPLITTER_CHUNK_SIZE` -- chunk size [`1_000_000`]
* `DNA_FEATURES_TRF_SPLIT_SPLITTER_OPTIONS` -- for a finer control [`--n_seq 1 --chunk_tolerance 10 --chunk_size ${DNA_FEATURES_TRF_SPLIT_SPLITTER_CHUNK_SIZE}`]
* `DNA_FEATURES_TRF_SPLIT_TRF_EXE` -- trf executrable (or abs path to be used) [`trf`]
* `DNA_FEATURES_TRF_SPLIT_TRF_OPTIONS` -- addtitional options for TRF (like `-l 10`) []
* `DNA_FEATURES_TRF_SPLIT_TRF_EXE` -- trf executable (or abs path to be used) [`trf`]
* `DNA_FEATURES_TRF_SPLIT_TRF_OPTIONS` -- additional options for TRF (like `-l 10`) []

## Usage examples
### A standalone run
Expand All @@ -34,7 +34,7 @@ trf_split_run.bash /writable/path_to/dna.fasta 2 5 7 80 10 40 500 -d -h
tweak_pipeline.pl -url "$DNA_FEATURES_EHIVE_DB_URL" -tweak 'analysis[TRF].param[parameters_hash]={program=>"'${ENSEMBL_ROOT_DIR}'/ensembl-genomio/scripts/trf_split_run.bash"}'
```
```
# set envitonment variables if you need to, i.e.
# set environment variables if you need to, i.e.
export DNA_FEATURES_TRF_SPLIT_TRF_EXE=trf.4.09.1
export DNA_FEATURES_TRF_SPLIT_TRF_OPTIONS='-l 10' # N.B. "l" correlated with the chunk size (-l chunk_size / 10^6)
export DNA_FEATURES_TRF_SPLIT_SPLITTER_CHUNK_SIZE=10_000_000'
Expand Down
16 changes: 13 additions & 3 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,15 @@ theme:
font:
text: Lato
code: IBM Plex Mono
features:
- content.tooltips
- navigation.expand
- navigation.tabs
- navigation.tabs.sticky
- navigation.top
- search.highlight
- search.suggest
- toc.follow
extra_css:
- stylesheets/extra.css

Expand All @@ -42,7 +51,7 @@ plugins:
- search
- gen-files:
scripts:
- docs/gen_ref_pages.py
- docs/scripts/gen_ref_pages.py
- literate-nav:
nav_file: summary.md
- section-index
Expand All @@ -51,9 +60,10 @@ plugins:
default_handler: python
handlers:
python:
paths: [src]
options:
inherited_members: true
members: true
filters: ["!^_"]
show_if_no_docstring: true
show_root_heading: true
show_source: true
show_symbol_type_heading: true
Expand Down
17 changes: 11 additions & 6 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ name = "ensembl-genomio"
dynamic = [
"version",
]
requires-python = ">= 3.8"
requires-python = ">= 3.9"
description = "Ensembl GenomIO -- pipelines to convert basic genomic data into Ensembl cores and back to flatfile"
readme = "README.md"
authors = [
Expand All @@ -35,22 +35,27 @@ keywords = [
"setup",
]
classifiers = [
"Development Status :: 3 - Alpha",
"Development Status :: 4 - Beta",
"Environment :: Console",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: Apache Software License",
"Natural Language :: English",
"Operating System :: OS Independent",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
JAlvarezJarreta marked this conversation as resolved.
Show resolved Hide resolved
"Topic :: Scientific/Engineering :: Bio-Informatics",
"Topic :: Software Development :: Libraries :: Python Modules",
]
dependencies = [
"bcbio-gff == 0.7.1",
"biopython == 1.81",
"ensembl-hive @ git+https://github.com/Ensembl/ensembl-hive.git",
"ensembl-py @ git+https://github.com/Ensembl/ensembl-py.git", # minimum v1.3.0
"ensembl-py @ git+https://github.com/Ensembl/ensembl-py.git", # minimum v2.0.0
"ensembl-utils >= 0.1.3",
"jsonschema >= 4.6.0",
"importlib_resources", # not needed from Python 3.9+
"intervaltree >= 3.1.0",
"mysql-connector-python >= 8.0.29",
"python-redmine >= 2.3.0",
Expand All @@ -72,7 +77,7 @@ cicd = [
"pytest-workflow >= 2.1.0",
"types-requests",
]
doc = [
docs = [
"mkdocs >= 1.5.3",
"mkdocs-autorefs",
"mkdocs-gen-files",
Expand Down
2 changes: 1 addition & 1 deletion src/python/ensembl/io/genomio/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@
# limitations under the License.
"""Genome Input/Output (GenomIO) handling library."""

__version__ = "0.1"
__version__ = "0.1.0"
3 changes: 1 addition & 2 deletions src/python/ensembl/io/genomio/assembly/download.py
Original file line number Diff line number Diff line change
Expand Up @@ -298,8 +298,7 @@ def retrieve_assembly_data(
download_dir = Path(download_dir)

# Set and create dedicated dir for download
if not download_dir.is_dir():
download_dir.mkdir(parents=True)
download_dir.mkdir(parents=True, exist_ok=True)

# Download if files don't exist or fail checksum
if not md5_files(download_dir, None):
Expand Down
15 changes: 15 additions & 0 deletions src/python/ensembl/io/genomio/data/gff3/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# See the NOTICE file distributed with this work for additional information
# regarding copyright ownership.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""GFF3-related data files."""
15 changes: 15 additions & 0 deletions src/python/ensembl/io/genomio/data/schemas/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# See the NOTICE file distributed with this work for additional information
# regarding copyright ownership.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Schema-related data files."""
8 changes: 4 additions & 4 deletions src/python/ensembl/io/genomio/gff3/gene_merger.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,13 @@
"GFFGeneMerger",
]

from importlib.resources import as_file, files
import logging
from os import PathLike
from pathlib import Path
import re
from typing import List

from importlib_resources import files

import ensembl.io.genomio.data.gff3
from ensembl.io.genomio.utils.json_utils import get_json

Expand All @@ -34,8 +33,9 @@ class GFFGeneMerger:
"""Specialized class to merge split genes in a GFF3 file, prior to further parsing."""

def __init__(self) -> None:
biotypes_json = files(ensembl.io.genomio.data.gff3) / "biotypes.json"
self._biotypes = get_json(biotypes_json)
source = files(ensembl.io.genomio.data.gff3).joinpath("biotypes.json")
with as_file(source) as biotypes_json:
self._biotypes = get_json(biotypes_json)

def merge(self, in_gff_path: PathLike, out_gff_path: PathLike) -> List[str]:
"""
Expand Down
Loading