Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The biobox Dockerfiles could have less boilerplate #131

Open
michaelbarton opened this issue Apr 28, 2015 · 12 comments
Open

The biobox Dockerfiles could have less boilerplate #131

michaelbarton opened this issue Apr 28, 2015 · 12 comments

Comments

@michaelbarton
Copy link
Contributor

I showed @accopeland what we've been doing with bioboxes recently. He gave a
few comments which were useful feedback. He mentioned that that bioboxes/velvet
Dockerfile contains a lot of code is not immediately obvious to a developer
what it is for. The lines in question are where jq, yaml2json and
bioboxes-validate-file are downloaded. This ends up being boilerplate code that
will have to be copied from Dockerfile to Dockerfile.

We might want to consider simplifying this code so that biobox authors can
instead focus more on setting up their assembler inside the image. A couple of
ideas:

  • Bundle all the tools into a single tar.
  • Maintain our own .deb files which will allow required tools to
    installed through the debian package manager along with other requirements.
@snewhouse
Copy link

@michaelbarton another simple solution is to have a Base image with all the required dependencies installed and pushed to docker hub.

The user then just builds their own assembler image from this image
eg:

FROM bioboxes/base
RUN wget foo &&  make foo
... 

@fungs
Copy link
Member

fungs commented Apr 28, 2015

I think both are good ideas. For the long term, I would put the container code into a repository and try to build Linux packages for all kinds of distributions using a simple mechanism like the OpenSUSE Build Service. Since most users choose Debian or Ubuntu as container distributions, we could also provide a Debian base image with container code installed as we have done here: https://github.com/CAMI-challenge/docker-interface/blob/master/Dockerfile

@michaelbarton
Copy link
Contributor Author

I merged in PR bioboxes/file-validator#25 today. This builds a debian package
of the file validator and pushes this to s3 after a successful build on the
master branch.

I've subsequently created a branch on the bioboxes/velvet assembler. This uses
the debian package manager to install the file validater. Please see the diff
for this change.
This removes the several lines need to install from a
tar and instead adds two lines to install via apt-get. This could be further
simplified to one line if authentication is added to the debian repo.

If this seems like a reasonable approach we could consider debian packages as
the primary medium of distributing software to biobox developers.

@michaelbarton
Copy link
Contributor Author

@snewhouse Thank you for the suggestion. This is a possible avenue we could use
to share tools, where the bioboxes base image is used in the FROM directive.

My preference is not to go down this route because it means we have to maintain
an image in addition to the individual tools. I also would prefer to allow
users to flexibly compose images using our, or other's tools rather than
inheriting a base image from us which forces them to accept any choices we may
make in that image.

@michaelbarton
Copy link
Contributor Author

@fungs I like the idea of using the OpenSuse build service. I'm not familiar
with it but I think using it in the longer term would be a good idea.

@pbelmann
Copy link
Member

Thanks Michael for creating .deb packages. Is there any advantage in using OpenSUSE build service in contrast to a self maintained debian package source. Maybe a faster download which is not important for our tiny validator package.

If there are no other advantages I will start updating our tutorial and the available bioboxes, so that we can close this issue.

@michaelbarton
Copy link
Contributor Author

I think the advantage of OpenSuse build service is that a package is automatically built for all flavors of Linux, e.g. Red hat and so forth. This would useful if someone wants to give this a try. Otherwise I agree we can continue the S3 Debian repo for now.

@michaelbarton
Copy link
Contributor Author

There's also the problem of installing unsigned packages which I haven't had time to look at yet.

@fungs
Copy link
Member

fungs commented Jun 30, 2015

In the meantime, until there is a separate guest tool package for bioboxes running inside the container (just like the virtualbox guest additions) and being packaged for all kinds of possible container Linux distributions, I have implemented a Debian base image with some client scripts added. For now, you can get and test it from the Docker registry: fungs/bbx-base:latest. It translates the YAML input into easy-to-use environment variables using a slick python parser and provides a simple bourne shell run init system which also detects the number of available CPUs to the container. Adding tasks or run options is as easy as creating a single text file with the corresponding name. More features which might be useful for running programs inside the containers can be added in the future.

@avilella
Copy link

This sounds really useful and flexible. Is there an example depicting how
it works?

On Tue, Jun 30, 2015 at 2:55 PM, Johannes Dröge [email protected]
wrote:

In the meantime, until there is a separate guest tool package for bioboxes
running inside the container (just like the virtualbox guest additions) and
being packaged for all kinds of possible container Linux distributions, I
have implemented a Debian base image with some client scripts added. For
now, you can get and test it from the Docker registry:
fungs/bbx-base:latest. It translates the YAML input into easy-to-use
environment variables using a slick python parser and provides a simple
bourne shell run init system which also detects the number of available
CPUs to the container. Adding tasks or run options is as easy as creating a
single text file with the corresponding name. More features which might be
useful for running programs inside the containers can be added in the
future.


Reply to this email directly or view it on GitHub
#131 (comment).

@michaelbarton
Copy link
Contributor Author

michaelbarton commented Jul 1, 2015 via email

@fungs
Copy link
Member

fungs commented Jul 1, 2015

Hi,

here are some more details, I just did't want to push it to the public in its undocumented and experimental stage.

https://github.com/fungs/bbx-base/

You mentioned setting environment variables, we did discuss using these in great detail at the start of bioboxes and we agreed that their use could become too complicated. An example when there were multiple files and parameters for each file. This is why we settled on the biobox.yaml file format.

This is just a one-way translation which is likely sufficient for most or all of the current implementations. After all, it is just an alternative way to access the information in the YAML file, I would really like to add a file system YAML representation to the container as a second alternative.

Considering YAML vs. environment variables, I would argue that if a container or task accepts a single input YAML schema and it is guaranteed to be valid input (e.g. by validating the input prior to calling the container), than the implementor can use the variables to access precisely the information he or she needs using the variables instead of the YAML input file without having to implement a YAML parser or even having to be aware of it. For bioboxes and the bioboxes tools (calling, validation -> see issue #163) it is however important to be able to specify and validate the input, output and the containers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants