Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify and optimize Docker image #264

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
4 changes: 4 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.git
slides/
.DS_Store
__pycache__/
37 changes: 37 additions & 0 deletions .github/workflows/docker-build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
name: Build Tutorial Container

on:
push:
branches:
- main
paths-ignore:
- "*.md"
- slides/**
- images/**
- .gitignore
workflow_dispatch:

jobs:
build-and-push:
runs-on: ubuntu-latest
permissions:
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Log in to GHCR
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Build the Docker image
run: |
docker build -t ghcr.io/${{ github.repository }}:latest .

- name: Push the Docker image
run: |
docker push ghcr.io/${{ github.repository }}:latest
edoardob90 marked this conversation as resolved.
Show resolved Hide resolved
51 changes: 51 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Use the jupyter/minimal-notebook as the base image
FROM quay.io/jupyter/minimal-notebook:latest

# Metadata labels
LABEL org.opencontainers.image.title="Python Tutorial"
LABEL org.opencontainers.image.description="A containerized Python tutorial environment with Jupyter Lab."
LABEL org.opencontainers.image.authors="Empa Scientific IT <[email protected]>"
LABEL org.opencontainers.image.url="https://github.com/empa-scientific-it/python-tutorial"
LABEL org.opencontainers.image.source="https://github.com/empa-scientific-it/python-tutorial"
LABEL org.opencontainers.image.version="1.0.0"
LABEL org.opencontainers.image.licenses="MIT"

# Set environment variables for the tutorial and repository
ENV BASENAME="python-tutorial"
ENV REPO=${HOME}/${BASENAME}
ENV IPYTHONDIR="${HOME}/.ipython"

# Switch to root user to install additional dependencies
USER root
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
gcc \
g++ \
libffi-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

# Switch back to the default notebook user
USER ${NB_UID}

# Set up the Conda environment
COPY docker/environment.yml /tmp/environment.yml
RUN mamba env update -n base -f /tmp/environment.yml && \
mamba clean --all -f -y && \
fix-permissions "${CONDA_DIR}" && \
fix-permissions "/home/${NB_USER}"

# Prepare IPython configuration (move earlier in the build)
RUN mkdir -p ${HOME}/.ipython/profile_default
COPY --chown=${NB_UID}:${NB_GID} binder/ipython_config.py ${HOME}/.ipython/profile_default/

# Copy the repository late in the build process
RUN mkdir -p ${REPO}
COPY --chown=${NB_UID}:${NB_GID} . ${REPO}/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is a bit tricky. The user should either start the same container (so the readme needs to be updated), or we should mount the folder to the container.

In any case, the readme should be synchronized with the new approach.

Copy link
Member Author

@edoardob90 edoardob90 Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the step is correct and makes sense only when we're building the image via GitHub Actions. The user can still mount the local folder with a local copy of the repository, and nothing changes if the are no mismatches. Otherwise they will have their local repository inside the container.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to create a docker volume, but I have to think about it carefully.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a bit of research. It seems that those lines are not needed: I added them to make sure to have the repository material inside the container, but:

  1. If the user is mounting a local folder with the repository, that's redundant
  2. GitHub Codespaces, which can use a prebuilt Docker image to speed up the startup, also clones the repository by default, so the lines are again redundant

If you agree that this is the case, we can remove them and speed up building the image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we have a quick chat about this today during the meeting, and then we can decide.


# Set the working directory to the repository
WORKDIR ${REPO}

# Use the default ENTRYPOINT from the base image to start Jupyter Lab
ENTRYPOINT ["tini", "-g", "--", "start.sh"]
2 changes: 2 additions & 0 deletions docker/activate-custom-env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
#!/bin/bash
conda activate python-tutorial
26 changes: 26 additions & 0 deletions docker/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
name: base
channels:
- conda-forge
dependencies:
- pip
- pip:
- numpy
- matplotlib
- pandas
- ipywidgets
- ipynbname
- jupyterlab
- pytest
- pytest-timeout
- markdown
- pre-commit
- geostatspy
- gstools
- scikit-learn
- attrs
- multiprocess
- openai
- tenacity
- markdown2
- python-dotenv
5 changes: 5 additions & 0 deletions docker/post-build.sh
edoardob90 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash
set -e

mkdir -p ${HOME}/.ipython/profile_default
cp binder/ipython_config.py ${HOME}/.ipython/profile_default/
52 changes: 52 additions & 0 deletions docker/setup_custom_env.py
edoardob90 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#!/usr/bin/env python3
import json
import os
import sys
from pathlib import Path

# Retrieve the environment name from the command-line arguments
env_name = sys.argv[1]

# Get the Conda directory from the environment variables
CONDA_DIR = os.environ["CONDA_DIR"]

# Define the path to the kernel.json file
kernel_dir = Path.home() / f".local/share/jupyter/kernels/{env_name}"
kernel_file = kernel_dir / "kernel.json"

# Ensure the kernel directory exists
kernel_dir.mkdir(parents=True, exist_ok=True)

# Define default kernel.json content
default_content = {
"argv": [
f"{CONDA_DIR}/envs/{env_name}/bin/python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}",
],
"display_name": f"Python ({env_name})",
"language": "python",
}

# If the kernel.json file doesn't exist, create it with default content
if not kernel_file.exists():
kernel_file.write_text(json.dumps(default_content, indent=1))

# Read the existing kernel.json content
content = json.loads(kernel_file.read_text())

# Add the environment variables to the kernel configuration
content["env"] = {
"XML_CATALOG_FILES": "",
"PATH": f"{CONDA_DIR}/envs/{env_name}/bin:$PATH",
"CONDA_PREFIX": f"{CONDA_DIR}/envs/{env_name}",
"CONDA_PROMPT_MODIFIER": f"({env_name}) ",
"CONDA_SHLVL": "2",
"CONDA_DEFAULT_ENV": env_name,
"CONDA_PREFIX_1": CONDA_DIR,
}

# Write the updated content back to the kernel.json file
kernel_file.write_text(json.dumps(content, indent=1))