Skip to content

Commit

Permalink
Merge pull request #118 from dockstore/feature/missingDocs
Browse files Browse the repository at this point in the history
Feature/missing docs
  • Loading branch information
denis-yuen authored Jun 25, 2021
2 parents 61d9c48 + d61c5fa commit 79cbc52
Show file tree
Hide file tree
Showing 11 changed files with 164 additions and 88 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Version Control Best Practices
- `Git Skills for New and Prospective Maintainers <https://www.youtube.com/watch?v=uvWhSYBkZJ0>`_
- Git repositories offer great tools for peer review, including `issues <https://blog.zenhub.com/best-practices-for-github-issues/>`_, `labels <https://robinpowered.com/blog/best-practice-system-for-organizing-and-tagging-github-issues/>`_, and `pull requests <https://docs.github.com/en/free-pro-team@latest/github/collaborating-with-issues-and-pull-requests/about-pull-requests>`_.

- Create an organization on a git repository and have your collaborators publish their peer reviewed tools or workflows within the organization. (`Here <https://docs.github.com/en/github/setting-up-and-managing-organizations-and-teams/creating-a-new-organization-from-scratch>`_ are instructions for GitHub).
- Create an organization on a git repository and have your collaborators publish their peer-reviewed tools or workflows within the organization. (`Here <https://docs.github.com/en/github/setting-up-and-managing-organizations-and-teams/creating-a-new-organization-from-scratch>`_ are instructions for GitHub).

- Organizations can centralize your work and help to foster a culture of peer review through Pull Requests.
- Submitting to an organization rather than hosting on an individual account provides a fallback for others if you become inactive on the git repository site.
Expand All @@ -32,34 +32,35 @@ Version Control Best Practices

- There should always be at least one ‘main’ branch that points to the most stable copy of your workflow.

- Any new development of features, optimizations, etc. should be created on a new branch/version that diverges from the main branch.
- Any new development of features, optimizations, etc., should be created on a new branch/version that diverges from the main branch.

- If developing multiple new features simultaneously or if multiple people are creating content, work should be split into separate branches.
- It’s best to split into branches by independent feature units, ex: “add-QC-before-alignment”.
- It’s best to split into branches by independent feature units, ex: “add-QC-before-alignment.”
- Once your feature is stable, create a pull request to merge the branch into your main branch. Once merged, you can delete the development branch if no longer needed.
- Note on GitHub repository and Docker image versioning: Many workflow repositories will contain both a Dockerfile, with instructions for building the Docker image, in addition to the workflow descriptor file(s) (e.g., .cwl, .wdl, .nexflow, etc.). This adds complexity when tags for Docker images mirror tags for the GitHub repository (as is possible using quay.io, for example). On a development branch, you may want the task to refer to a development version of the Docker image (e.g., quay.io/my_account/my_image:develop). This means that a perfectly-functioning development branch commit could become "incorrect" after being merged into the master branch (because the descriptor file task(s) will be referring to the development Docker image version rather than an immutable version. The best current solution is to update the descriptor file just prior to (or during) the pull request so that the tasks reference the digest format of the Docker image (e.g., quay.io/my_account/my_image:f63e020c4062e0be80831a50de8640).

- Publish releases of workflow to save your work at a stable version for publication and citation. On GitHub these are ‘tags’ (`learn how to manage tags <https://docs.github.com/en/free-pro-team@latest/desktop/contributing-and-collaborating-using-github-desktop/managing-tags>`_). Below, we discuss how such releases can become immutable when synced with the snapshots feature on Dockstore.
- Publish releases of workflow to save your work at a stable version for publication and citation. On GitHub, these are ‘tags’ (`learn how to manage tags <https://docs.github.com/en/free-pro-team@latest/desktop/contributing-and-collaborating-using-github-desktop/managing-tags>`_). Below, we discuss how such releases can become immutable when synced with the snapshots feature on Dockstore.


.. _image-container-best-practices:

Image / Container Best Practices
---------------------------------

- Because anyone can publish an image in a public repository (Docker Hub, Quay, etc.), you should be cautious of third-party containers because they may contain malware or insecure software, or may have insecure settings. These may result in `cryptojacking <https://sysdig.com/blog/detecting-cryptojacking/>`_. See an example of a malicious image in `this GitHub repo <https://github.com/docker/hub-feedback/issues/1570>`_.
- Because anyone can publish an image in a public repository (Docker Hub, Quay, etc.), you should be cautious of third-party containers because they may contain malware or insecure software or may have insecure settings. These may result in `cryptojacking <https://sysdig.com/blog/detecting-cryptojacking/>`_. See an example of a malicious image in `this GitHub repo <https://github.com/docker/hub-feedback/issues/1570>`_.
- When creating custom images, we recommend starting with `official images <https://docs.docker.com/docker-hub/official_images/>`_. This way you know that you are starting with a secure base since these images are maintained to remove vulnerabilities.
- You may find helpful images from sources such as BioContainer that maintains `images for 1K+ bioinformatics tools <https://biocontainers.pro/#/registry>`_. We cannot guarantee that BioContainer images are secure, so we recommend you scan all non-official images for vulnerabilities. Tools such as `Snyk <https://support.snyk.io/hc/en-us/articles/360014875297-Getting-started-with-Snyk-Open-Source>`_ and `Trivy <https://github.com/aquasecurity/trivy>`_ scan containers for security concerns.
- If you detect a vulnerability in a container you are interested in, we suggest you 1) contact the maintainer to update the image, or 2) if there is a Dockerfile, use it as a template to update the image yourself. Try inspecting the Dockerfile and only include those parts you feel are trustworthy. Consider upgrading versions of packages as they may be a source of vulnerabilities.

- Use Dockerfiles to describe and configure images:

- See `Best Practices from Docker <https://www.docker.com/blog/intro-guide-to-dockerfile-best-practices/>`_ and `10 Simple Rules for Writing Dockerfiles for Reproducible Analysis <https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008316>`_ .
- See `Best Practices from Docker <https://www.docker.com/blog/intro-guide-to-dockerfile-best-practices/>`_ and `10 Simple Rules for Writing Dockerfiles for Reproducible Analysis <https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008316>`_.
- Use a source control site (as mentioned above in :ref:`version-control-best-practices`) for versioning Dockerfiles (see a `simple example for versioning your Docker Image <https://medium.com/better-programming/how-to-version-your-docker-images-1d5c577ebf54>`_).

- Keep images light:

- More packages increases risks; try to avoid installing unnecessary packages in your images. That being said, starting with a very bare image (such as Alpine) may lead to a long setup, or difficulties in debugging.
- Images tagged with "-slim" contain the minimum components needed to run, without being as strict as Alpine-based images. They can often provide a happy medium between a reduced size, enhanced security, and usability.
- More packages increase risks; try to avoid installing unnecessary packages in your images. That being said, starting with a very bare image (such as Alpine) may lead to a long setup or difficulties in debugging.
- Images tagged with "-slim" contain the minimum components needed to run without being as strict as Alpine-based images. They can often provide a happy medium between a reduced size, enhanced security, and usability.

- Some helpful starting images are suggested below:

Expand All @@ -72,19 +73,18 @@ Image / Container Best Practices
- A good rule of thumb is that each image should have a specific purpose. Avoid installing all of the software you need for an entire analysis in one container, instead use multiple containers.
- Don’t include test data inside the image. Recommendations for hosting test data alongside your workflow can be found in the section below titled :ref:`accessible`.

- Publish your pre-built image in an open source container registry (such as DockerHub or Quay.io):
- Publish your pre-built image in an open-source container registry (such as DockerHub or Quay.io):

- Automate builds using an image registry that is configured to trigger a build whenever a change is pushed to the Dockerfile source control repository.
- Similar to our suggestion to publish your workflow under a GitHub organization, publish your images in an organization on a container registry. Additionally, this may make it easier for your institute to pay for a group plan to ensure your images never expire.

- Limitation on and expiration of images: At the time of writing this, DockerHub has announced some new policies around pull limits as well as their intention to expire DockerHub images from free accounts that haven't been pulled for some defined period of time (update: `this policy is delayed <https://www.docker.com/blog/docker-hub-image-retention-policy-delayed-and-subscription-updates/>`_). For example, this could mean that a workflow that hasn't been run in one year may no longer be reproducible if the image has been removed.
- Limitation on and expiration of images: DockerHub has announced policies around pull limits as well as their intention to expire DockerHub images that haven't been pulled for some defined period of time (At the time of writing this, Dockerhub has delayed `this policy <https://www.docker.com/blog/docker-hub-image-retention-policy-delayed-and-subscription-updates/>`_). For example, this could mean that a workflow that hasn't been run in some period of time may no longer be reproducible if the image has been removed.

- Alternative options include:

- Using images from paid organizations on DockerHub
- Paying for a DockerHub account (this may be more cost effective if you’re able to create an organization with multiple accounts)
- DockerHub offers exceptions to some open source projects that you may be able to get depending on your use case
- Hosting the image on a different repository such as Google Container Repository, Quay.io, GitHub Packages, AWS ECR, etc.
- Hosting the image on a different repository such as Google Container Repository, Quay.io, GitHub Packages, AWS ECR, etc.
- Using images from paid organizations on DockerHub.
- Paying for a DockerHub account (this may be more cost-effective if you’re able to create an organization with multiple accounts).
- DockerHub offers exceptions to some open source projects that you may be able to get depending on your use case.
- Migrating images to another repository to mitigate the impact of DockerHub pull request limits (`see example <https://www.openshift.com/blog/mitigate-impact-of-docker-hub-pull-request-limits>`_).


Expand All @@ -99,9 +99,9 @@ Findable

- Naming:

- Keep the workflow name short
- Keep the workflow name short.

- Use all lowercase letters for compatibility with other platforms such as DockerHub
- Use all lowercase letters for compatibility with other platforms such as DockerHub.

- Authorship, contact information, and description:

Expand Down Expand Up @@ -176,7 +176,7 @@ Reusable
}

- The examples below show **how not to reference a container** in a workflow task. These exmaple formats can change and cause the workflow to no longer be reproducible.
- The examples below show **how not to reference a container** in a workflow task. These formats can change and cause the workflow to no longer be reproducible.

Do not reference parameterized images:

Expand Down Expand Up @@ -226,16 +226,17 @@ Do not use untagged or “latest”.

- As mentioned in :ref:`image-container-best-practices`, test data should be hosted outside of the container.

- GitHub can host small files such as csv or tsv (for example: trait data)
- GitHub can host small files such as csv or tsv (for example: trait data).

- Broad’s Terra platform hosts multiple genomic files in this `open access Google bucket <https://console.cloud.google.com/storage/browser/terra-featured-workspaces>`_
- Broad’s Terra platform hosts multiple genomic files in this `open access Google bucket <https://console.cloud.google.com/storage/browser/terra-featured-workspaces>`_.

- Consider providing both a full sample run and a small down-sampled development test.

- A small development dataset is necessary for checker workflows. It also helps others explore your workflow without incurring heavy resource/computational costs.

- A full-sized sample is helpful for benchmarking your workflow and providing end-users with realistic compute and cost requirements.

- When writing your descriptor files, do not import remote descriptors using HTTP(s), nor use scripts outside of the container as input files. These practices decrease reusability and increase security risks.
- Provide a permissive license such as the `MIT License <https://choosealicense.com/licenses/mit/>`_, or `choose a license <https://choosealicense.com/>`_ that best fits your needs. It can be a text file in the git repository where the workflow is published (see `this example <https://github.com/nf-core/rnaseq/blob/master/LICENSE>`_).

- Provide a thorough README in the git repository. Here is an example of thorough documentation.
Expand Down
34 changes: 34 additions & 0 deletions docs/advanced-topics/checksum-support.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,20 @@ More specifically, the endpoints that contain checksums for files are as follows

The id parameter used in the endpoints above can be found on an entry's public page; underneath the Info tab, look for the bolded words **TRS**.

After gathering the checksum using the above method you can verify a descriptor's checksum using the shasum terminal application.
This is done by requesting the PLAIN_WDL descriptor and piping the output to shasum.

::

trsid=%23workflow%2Fgithub.com%2Fdockstore-testing%2Fdockstore-workflow-md5sum-unified%2Fwdl
version=1.2.0
curl -s https://dockstore.org/api/ga4gh/trs/v2/tools/$trsid/versions/$version/PLAIN-WDL/descriptor | shasum

The resulting checksum should match what was provided by the API above.

If you use the Dockstore CLI client descriptor checksums are verified before being sent to the execution engine.


CLI Descriptor Validation Support
------------------------------------------
By default, when launching tools or workflows from the CLI, primary and secondary descriptors will be validated using their SHA-1 checksums. Checksums are
Expand Down Expand Up @@ -52,6 +66,26 @@ Descriptions for the two endpoints of note are as follows:

Just like the file endpoints, the id parameter used in the endpoints above can be found on an entry's public page; underneath the Info tab, look for the bolded words **TRS**.

To verify a checksum as reported by the Dockstore API matches what you download from the Docker registry first find the checksum
and image path using one of the above methods for the image you would like to verify. Then download the image using the
Docker CLI client.

::

docker pull quay.io/briandoconnor/dockstore-tool-md5sum:1.0.4

When the download has completed a Digest is provided in the terminal output. This should match the checksum provided
by the Dockstore API.

Verifying the image checksum can give you better guarantees the image has not changed since the workflow was published to Dockstore.
However, in some cases the image checksum may diverge, for example, if the image was defined in a git branch that has since
been updated. For best results, and to avoid your Docker image being deleted because of a registry's retention policy,
use Docker images referred to by a tagged version or digest. The verification features available may vary between execution engines.

For more information on Docker registry retention policies see posts from `Docker <https://www.docker.com/blog/scaling-dockers-business-to-serve-millions-more-developers-storage/`_,
`AWS <https://aws.amazon.com/blogs/compute/clean-up-your-container-images-with-amazon-ecr-lifecycle-policies/>`_,
or `Azure <https://docs.microsoft.com/en-us/azure/container-registry/container-registry-retention-policy>`_.

Tools
-----
As noted in the table above, Docker image checksums are grabbed on refresh and should work as long as the image is from Quay.io, Docker Hub,
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
# -- Project information -----------------------------------------------------

project = u'Dockstore'
copyright = u'2020, OICR'
copyright = u'2021, OICR, and UCSC'
author = u'OICR, UCSC'

# The short X.Y version
Expand Down
18 changes: 18 additions & 0 deletions docs/faq.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,24 @@
FAQ
===

What happens if I rename my GitHub repository?
----------------------------------------------

When you have registered a tool or workflow from GitHub in Dockstore the
link to the repository for your tool or workflow is displayed on the
Info tab next to 'Source Code'. If you then `rename <https://docs.github.com/en/github/administering-a-repository/renaming-a-repository>`__
the GitHub repository, the Source Code link will display the original name, but
will resolve to the correct GitHub location when you click on it.

Another side effect is that you will be able to register the workflow
again in Dockstore under the new GitHub name, so you effectively
will have registered the same workflow twice.

Please note the GitHub warning: If you create a new repository under
your account in the future, do not reuse the original name of the renamed
repository. If you do, redirects to the renamed repository will break.


How does launching with Dockstore CLI compare with cwltool?
-----------------------------------------------------------

Expand Down
25 changes: 25 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ If this is your first time learning about Dockstore, we recommend starting with
you to the core concepts of Dockstore, leaving you with a good understanding of the platform. However, if you are simply looking to launch tools and workflows, we recommend
going straight to the :doc:`End User Topics <end-user-topics/end-user-topics>` or our `quickstart guide <https://dockstore.org/quick-start>`_.

.. toctree::
:maxdepth: 1

Go to Dockstore <https://dockstore.org>

.. toctree::
:caption: About
:maxdepth: 1
Expand All @@ -34,6 +39,12 @@ going straight to the :doc:`End User Topics <end-user-topics/end-user-topics>` o
getting-started/getting-started-with-services
getting-started/github-apps/github-apps-landing-page

.. toctree::
:caption: Videos (Tutorials & Presentations)
:maxdepth: 1

videos

.. toctree::
:caption: Launch
:maxdepth: 1
Expand Down Expand Up @@ -103,6 +114,12 @@ going straight to the :doc:`End User Topics <end-user-topics/end-user-topics>` o

faq

.. toctree::
:caption: Roadmap
:maxdepth: 1

roadmap

.. toctree::
:caption: Changelog
:maxdepth: 1
Expand All @@ -119,6 +136,12 @@ going straight to the :doc:`End User Topics <end-user-topics/end-user-topics>` o
news
news/*

.. toctree::
:caption: System Status
:maxdepth: 1

systemstatus

.. centered:: In Affiliation with

.. centered:: |CollabLink|_ |imagespace| |OicrLink|_ |imagespace| |Ga4ghLink|_ |imagespace| |UcscLink|_
Expand Down Expand Up @@ -183,3 +206,5 @@ going straight to the :doc:`End User Topics <end-user-topics/end-user-topics>` o
.. _TerraLink: https://terra.bio/

.. |imagespace| unicode:: U+00A0 U+00A0 U+00A0 U+00A0 U+00A0 .. non-breaking spaces between logo images


10 changes: 0 additions & 10 deletions docs/news/2016-11-24-Upload-tutorial-video.rst

This file was deleted.

Loading

0 comments on commit 79cbc52

Please sign in to comment.