-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streamlining the docker image #35
Comments
One reason is: we're basing on alpine, and then installing pandas. See: https://stackoverflow.com/questions/49037742/why-does-it-take-ages-to-install-pandas-on-alpine-linux If we really need all this stuff, it's more effective to start out with a bigger image (not alpine). |
Try switching:
to
Also there are some others on there, like |
It builds much faster like that... and I could move all the |
yeah, this is an issue. I made a local "dependancies" image with everything but the bedhost package to streamline installation for testing. And have been importing from that. In fact, I was planning to look at the bedhost code to see if we really need pandas. I presume we could do without this dependancy. |
Look: tiangolo/uvicorn-gunicorn python3.8-alpine3.10 142MB So, using the alpine one, we're adding 700Mb of stuff; using the normal one, we're only adding 200 Mb to it. So even though it's bigger, we get more of it into the base image, so it's less pushing/pulling when we update. The goal should be to get as much stuff into the base image as possible, so that updates are fast, even at the cost of slightly bigger images. Probably best bet is to use |
There are also |
With python3.8-slim and a slightly modified Dockerfile, I got it to: tiangolo/uvicorn-gunicorn-fastapi python3.8-slim 278MB So, still probably worth having a prereqs container. |
FYI, I eliminated the pandas dependency |
wow, great! that probably helps a lot |
A lot has changed since this issue, but it looks like the v0.3.0 image is still about 1Gb. It is probably worth spending some time to look at this and see if there are any dependencies that can be eliminated for the deployed version. |
New image is 2G or even more. This is due to Geniml models. I guess we can't change it. |
|
Actually, now that I think about it, I think this is probably a good idea. In fact, we could move this into a containing image, which could be even faster... is it worth it? |
BEDhost images: https://hub.docker.com/r/databio/bedhost/tags Even up to v0.5.0, we have only 1Gb images. But the new dev image went up to 3.7Gb. |
A related issue here: pepkit/pephub#109 Maybe pinning torch to a version +cpu? |
Related: https://github.com/databio/geniml_dev/issues/112 Here: Line 30 in fb9266f
|
Right now, building the bedhost docker image takes several minutes and ends up in an image that is almost 1 GB, which also takes a long time to upload and deploy to servers.
Just glancing at the Dockerfile I see lots of stuff in there; for example,
openssh
,gcc
, lots of development tools. It's fine for now as it works, but at some point we should take a close look at this and try to streamline this container, as it will make dev/ops iterate much faster. It seems to take several minutes just to installpandas
. Not sure what we can do about that.The text was updated successfully, but these errors were encountered: