Streamlining the docker image #35

nsheff · 2020-11-05T13:50:53Z

Right now, building the bedhost docker image takes several minutes and ends up in an image that is almost 1 GB, which also takes a long time to upload and deploy to servers.

Just glancing at the Dockerfile I see lots of stuff in there; for example, openssh, gcc, lots of development tools. It's fine for now as it works, but at some point we should take a close look at this and try to streamline this container, as it will make dev/ops iterate much faster. It seems to take several minutes just to install pandas. Not sure what we can do about that.

The text was updated successfully, but these errors were encountered:

nsheff · 2020-11-05T13:53:45Z

One reason is: we're basing on alpine, and then installing pandas.

See: https://stackoverflow.com/questions/49037742/why-does-it-take-ages-to-install-pandas-on-alpine-linux

If we really need all this stuff, it's more effective to start out with a bigger image (not alpine).

nsheff · 2020-11-05T13:55:31Z

Try switching:

FROM tiangolo/uvicorn-gunicorn:python3.8-alpine3.10

to

FROM tiangolo/uvicorn-gunicorn:python3.8

Also there are some others on there, like tiangolo/uvicorn-gunicorn-fastapi -- why are you not using that one instead, with fastapi? already installed? Or at least tiangolo/uvicorn-gunicorn-starlette?

nsheff · 2020-11-05T14:07:10Z

It builds much faster like that... and I could move all the apk installs in the Dockerfile. But the image is 1.2 G. So, this may take a bit of testing to figure out the optimal balance.

stolarczyk · 2020-11-05T14:12:38Z

yeah, this is an issue. I made a local "dependancies" image with everything but the bedhost package to streamline installation for testing. And have been importing from that.

In fact, I was planning to look at the bedhost code to see if we really need pandas. I presume we could do without this dependancy.

nsheff · 2020-11-05T14:18:45Z

Look:

tiangolo/uvicorn-gunicorn python3.8-alpine3.10 142MB
tiangolo/uvicorn-gunicorn python3.8 965MB

So, using the alpine one, we're adding 700Mb of stuff; using the normal one, we're only adding 200 Mb to it. So even though it's bigger, we get more of it into the base image, so it's less pushing/pulling when we update.

The goal should be to get as much stuff into the base image as possible, so that updates are fast, even at the cost of slightly bigger images. Probably best bet is to use tiangolo/uvicorn-gunicorn-fastapi -- not sure if there was a reason we stepped away from using that one, there may have been (version issues or something).

nsheff · 2020-11-05T14:21:14Z

There are also python3.8-slim ones, which could be the best compromise... so, tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim could be the best bet...

nsheff · 2020-11-05T16:03:20Z

With python3.8-slim and a slightly modified Dockerfile, I got it to:

tiangolo/uvicorn-gunicorn-fastapi python3.8-slim 278MB
databio/bedhost 667MB

So, still probably worth having a prereqs container.

stolarczyk · 2020-11-09T17:30:36Z

FYI, I eliminated the pandas dependency

nsheff · 2020-11-09T18:00:11Z

wow, great! that probably helps a lot

Dev

nsheff · 2024-03-14T12:27:28Z

A lot has changed since this issue, but it looks like the v0.3.0 image is still about 1Gb.

It is probably worth spending some time to look at this and see if there are any dependencies that can be eliminated for the deployed version.

khoroshevskyi · 2024-03-20T21:12:06Z

New image is 2G or even more. This is due to Geniml models. I guess we can't change it.
Additionally, with @nleroy917 we cleaned and streamlined docker image

khoroshevskyi · 2024-04-09T03:02:59Z

the sentence transformer model is downloaded on launch; should this instead move into the container (pre-cache it?)

nsheff · 2024-05-24T19:44:59Z

Actually, now that I think about it, I think this is probably a good idea. In fact, we could move this into a containing image, which could be even faster... is it worth it?

nsheff · 2024-10-09T14:17:15Z

BEDhost images: https://hub.docker.com/r/databio/bedhost/tags

Even up to v0.5.0, we have only 1Gb images.

But the new dev image went up to 3.7Gb.

nsheff · 2024-10-09T14:21:59Z

A related issue here: pepkit/pephub#109

Maybe pinning torch to a version +cpu?

nsheff · 2024-10-09T16:43:47Z

Related: https://github.com/databio/geniml_dev/issues/112

Here:

bedhost/Dockerfile

Line 30 in fb9266f

    
           RUN pip install torch==2.1.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

nsheff assigned stolarczyk Nov 5, 2020

nsheff added the priority-low label Nov 5, 2020

nsheff added a commit that referenced this issue Nov 5, 2020

update dockerfile, see #35

21e76e3

stolarczyk mentioned this issue Dec 18, 2020

create bigbed files databio/bedstat#26

Merged

stolarczyk mentioned this issue Feb 19, 2021

TrackHub/generate bigBed files databio/bedmaker#14

Merged

khoroshevskyi assigned nsheff and unassigned stolarczyk Dec 4, 2023

khoroshevskyi pushed a commit that referenced this issue Mar 12, 2024

Merge pull request #35 from databio/dev

f867bac

Dev

nsheff added this to the v0.4.0 milestone Mar 14, 2024

nsheff removed the priority-low label Mar 14, 2024

khoroshevskyi modified the milestones: v0.4.0, v0.5.0 Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamlining the docker image #35

Streamlining the docker image #35

nsheff commented Nov 5, 2020

nsheff commented Nov 5, 2020

nsheff commented Nov 5, 2020 •

edited

Loading

nsheff commented Nov 5, 2020

stolarczyk commented Nov 5, 2020 •

edited

Loading

nsheff commented Nov 5, 2020

nsheff commented Nov 5, 2020

nsheff commented Nov 5, 2020

stolarczyk commented Nov 9, 2020

nsheff commented Nov 9, 2020

nsheff commented Mar 14, 2024

khoroshevskyi commented Mar 20, 2024

khoroshevskyi commented Apr 9, 2024

nsheff commented May 24, 2024

nsheff commented Oct 9, 2024

nsheff commented Oct 9, 2024

nsheff commented Oct 9, 2024 •

edited

Loading

Streamlining the docker image #35

Streamlining the docker image #35

Comments

nsheff commented Nov 5, 2020

nsheff commented Nov 5, 2020

nsheff commented Nov 5, 2020 • edited Loading

nsheff commented Nov 5, 2020

stolarczyk commented Nov 5, 2020 • edited Loading

nsheff commented Nov 5, 2020

nsheff commented Nov 5, 2020

nsheff commented Nov 5, 2020

stolarczyk commented Nov 9, 2020

nsheff commented Nov 9, 2020

nsheff commented Mar 14, 2024

khoroshevskyi commented Mar 20, 2024

khoroshevskyi commented Apr 9, 2024

nsheff commented May 24, 2024

nsheff commented Oct 9, 2024

nsheff commented Oct 9, 2024

nsheff commented Oct 9, 2024 • edited Loading

nsheff commented Nov 5, 2020 •

edited

Loading

stolarczyk commented Nov 5, 2020 •

edited

Loading

nsheff commented Oct 9, 2024 •

edited

Loading