Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dvc pull command #99

Merged
merged 6 commits into from
Feb 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,26 @@ and this project adheres to [Semantic Versioning][].
[keep a changelog]: https://keepachangelog.com/en/1.0.0/
[semantic versioning]: https://semver.org/spec/v2.0.0.html

## [Unreleased]
## v0.12.0

### Migration advice

The default pre-commit configuration has been reworked. To update it, navigate to the root of your project. Then run

```bash
rm .pre-commit-config.yaml
dso init .
```

dso init will re-add all files from the project template that are missing from your project. Existing files will not be touched.

### Template updates

- Update `.pre-commit-config.yaml`, removing unnecessary hooks ([#99](https://github.com/Boehringer-Ingelheim/dso/pull/99)).

### New features

- Add `dso pull` command, a wrapper around `dso compile-config` + `dvc pull` ([#99](https://github.com/Boehringer-Ingelheim/dso/pull/99))
- Add templates for Python stages (`quarto_py`, `quarto_ipynb`) ([#98](https://github.com/Boehringer-Ingelheim/dso/pull/98)).

### Documentation
Expand Down
8 changes: 8 additions & 0 deletions docs/cli_command_reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,14 @@
invoke(dso, args=["lint", "--help"])
```

## dso pull

```{eval-rst}
.. click:run::
from dso.cli import dso
invoke(dso, args=["pull", "--help"])
```

## dso repro

```{eval-rst}
Expand Down
19 changes: 18 additions & 1 deletion docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ dvc add <directoryname/filename>
dvc add metadata/external_clinical_annotation.csv
```

### Syncing data with a remote
### Pushing data data to a remote

To ensure your collaborators can access files you added or results you generated, you need to sync
data with a [dvc remote](https://dvc.org/doc/user-guide/data-management/remote-storage#remote-storage). DVC supports
Expand Down Expand Up @@ -235,3 +235,20 @@ git commit -m "Descriptive commit message"
# Push committed changes to the remote Git repository
git push
```

Pushing to the dvc remote can be automated using the [pre-commit integration](user_guide/pre_commit.md).

### Pulling changes from remote

Whenever you switch into a new stage of your git repository (e.g. by using `git pull` to retreive changes
from a collaborator, or by using `git switch` to move to another branch) you need to make sure the
corresponding data tracked by `dvc` is checked out as well.

The most convenient way to do so is by using

```bash
dso pull
```

This first compiles the configuration and then calls `dvc pull` internally. Running `dvc pull` without
compiling the configuration files first may fail.
101 changes: 99 additions & 2 deletions docs/user_guide/pre_commit.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,102 @@
# pre-commit integration

Here we describe the pre-commit hooks used in DSO projects
[Pre-commit](https://pre-commit.com/) hooks are small scripts that perform consistency checks on
files in the repository before actually performing a git commit (or push).
If the checks fail, the commit will be aborted and needs to be retried once the problems have been
fixed. Some issues can automatically be fixed by the hooks.

TODO
Pre-commit hooks are defined in a `.pre-commit-config.yaml` file at the root
of a repository. The [DSO project template](templates.md) comes with a default configuration that is detailled below.

To activate pre-commit integration, the hooks need to be installed in each repository separately. The DSO CLI
will ask the user whether to do so. You can also install the hooks manually by running

```bash
pre-commit install
```

This will write the hooks into the `.git` directory.

## Example

Let's assume we made some changes that are already in the git staging area (`git add`) and are about
to be committed.

```console
$> git commit
detect private key.......................................................Passed
check python ast.....................................(no files to check)Skipped
fix end of files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook

Fixing params.in.yaml

mixed line ending........................................................Passed
trim trailing whitespace.................................................Passed
check for case conflicts.................................................Passed
check for merge conflicts................................................Passed
nbstripout...........................................(no files to check)Skipped
Run dso lint.............................................................Passed
```

The check failed, because the "end of file" has been fixed in one file. This now shows as "unstaged"
file in `git status`:

```console
$> git status
[...]
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: params.in.yaml
```

We need to add those changes, and then redo the commit:

```console
$> git add params.in.yaml
$> git commit
detect private key.......................................................Passed
check python ast.....................................(no files to check)Skipped
fix end of files.........................................................Passed
mixed line ending........................................................Passed
trim trailing whitespace.................................................Passed
check for case conflicts.................................................Passed
check for merge conflicts................................................Passed
nbstripout...........................................(no files to check)Skipped
Run dso lint.............................................................Passed
[master d60efd3] update params
1 file changed, 1 insertion(+)
```

Now all checks have passed and the commit has been created.

## pre-commit hooks

The following pre-commit hooks are included in the default configuration file:

- [detect-private-key](https://github.com/pre-commit/pre-commit-hooks?tab=readme-ov-file#detect-private-key): Scans for private keys within files to prevent accidental exposure.
- [check-ast](https://github.com/pre-commit/pre-commit-hooks?tab=readme-ov-file#check-ast): Parses Python files to catch syntax errors before commits.
- [end-of-file-fixer](https://github.com/pre-commit/pre-commit-hooks?tab=readme-ov-file#end-of-file-fixer): Ensures every file ends with a newline.
- [mixed-line-ending](https://github.com/pre-commit/pre-commit-hooks?tab=readme-ov-file#mixed-line-ending): Normalizes line endings to avoid mixing CRLF and LF.
- [trailing-whitespace](https://github.com/pre-commit/pre-commit-hooks?tab=readme-ov-file#trailing-whitespace): Removes trailing whitespace from files.
- [check-case-conflict](https://github.com/pre-commit/pre-commit-hooks?tab=readme-ov-file#check-case-conflict): Detects potential conflicts arising from case insensitive filenames.
- [check-merge-conflict](https://github.com/pre-commit/pre-commit-hooks?tab=readme-ov-file#check-merge-conflict): Scans files for merge conflict markers.
- [nbstripout](https://github.com/kynan/nbstripout): Remove outputs from jupyter notebooks (contents shall not go to git. Instead the rendered HTML file is tracked by dvc).
- dso lint: Run the [dso linter](linting.md) on all files.

The following hooks are included, but commented out. These hooks are autoformatters for Python, R, Markdown and other files. We do recommend enabling them, but we acknowledge that not everyone may like them.

- [Prettier](https://prettier.io/): formats Markdown, JSON, CSS, JS and others
- [Ruff](https://astral.sh/ruff): Linter and formatter for Python (including jupyter notebooks)
- [Styler](https://styler.r-lib.org/): Formatter for R and Rmarkdown notebooks.

## pre-push hooks

As the name suggests, `pre-push` hooks run before `git push`. The following hook is included in the default
configuration file:

- dvc push: Run [dvc push](https://dvc.org/doc/command-reference/push#push) to ensure all data files are
synced to the remote at the same time the code changes are pushed to the git remote.
49 changes: 28 additions & 21 deletions src/dso/cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,25 +144,6 @@ def dso_lint(args, skip_compile: bool = False):
lint(paths)


@click.command(
name="repro",
context_settings={"ignore_unknown_options": True},
)
@click.argument("args", nargs=-1, type=click.UNPROCESSED)
def dso_repro(args):
"""Wrapper around dvc repro, compiling configuration before running."""
from dso._compile_config import compile_all_configs
from dso._util import check_ask_pre_commit

check_ask_pre_commit(Path.cwd())
compile_all_configs([get_project_root(Path.cwd())])
os.environ["DSO_SKIP_COMPILE"] = "1"
cmd = ["dvc", "repro", *args]
log.info(f"Running `{' '.join(cmd)}`")
res = subprocess.run(cmd)
sys.exit(res.returncode)


@click.command(name="watermark")
@click.argument("input_image", type=Path)
@click.argument("output_image", type=Path)
Expand All @@ -189,7 +170,7 @@ def dso_watermark(input_image, output_image, text, **kwargs):
Watermarker.add_watermark(input_image, output_image, text=text, **kwargs)


@click.group()
@click.group(invoke_without_command=True)
@click.option(
"-q",
"--quiet",
Expand Down Expand Up @@ -224,11 +205,37 @@ def dso(quiet: int, verbose: bool):
os.environ["DSO_VERBOSE"] = "1"


def _dvc_wrapper(command: str):
@click.command(
name=command,
help=f"Wrapper around `dvc {command}`, compiling configuration before running.",
context_settings={"ignore_unknown_options": True},
)
@click.argument("args", nargs=-1, type=click.UNPROCESSED)
def command_wrapper(args):
"""Wrapper around any dvc command, compiling configuration before running."""
from dso._compile_config import compile_all_configs
from dso._util import check_ask_pre_commit

check_ask_pre_commit(Path.cwd())
compile_all_configs([get_project_root(Path.cwd())])
os.environ["DSO_SKIP_COMPILE"] = "1"
# use `python -m dvc`` syntax to ensure we are using dvc from the same venv
cmd = [sys.executable, "-m", "dvc", command, *args]
log.debug(f"Running `{' '.join(cmd)}`")
res = subprocess.run(cmd)
sys.exit(res.returncode)

return command_wrapper


dso.add_command(dso_create)
dso.add_command(dso_init)
dso.add_command(dso_compile_config)
dso.add_command(dso_repro)
dso.add_command(dso_exec)
dso.add_command(dso_lint)
dso.add_command(dso_get_config)
dso.add_command(dso_watermark)

for command in ["repro", "pull"]:
dso.add_command(_dvc_wrapper(command))
88 changes: 41 additions & 47 deletions src/dso/templates/init/default/.pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,29 +8,8 @@ default_install_hook_types:
- pre-push
- post-checkout
minimum_pre_commit_version: 2.16.0
exclude: '(params\.yaml$|dvc\.lock$)' # these are auto-generated and potentially conflicting with autoformatting
exclude: '(dvc\.lock$|.*\.dvc$)' # these are auto-generated and potentially conflicting with autoformatting
repos:
# These are hooks for automated formatting - we recommend them but don't enable them by default.
# - repo: https://github.com/pre-commit/mirrors-prettier
# rev: v4.0.0-alpha.8
# hooks:
# - id: prettier
# exclude: 'params\.yaml$' # these are auto-generated and potentially conflicting with prettier
# - repo: https://github.com/astral-sh/ruff-pre-commit
# rev: v0.4.7
# hooks:
# - id: ruff
# types_or: [python, pyi, jupyter]
# args: [--fix, --exit-non-zero-on-fix]
# - id: ruff-format
# types_or: [python, pyi, jupyter]

# for ipynb files in `src` directories: we never want to commit any output as rendered output files
# are tracked by dvc
- repo: https://github.com/kynan/nbstripout
rev: "0.8.1"
hooks:
- id: nbstripout
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
Expand All @@ -41,37 +20,52 @@ repos:
args: [--fix=lf]
- id: trailing-whitespace
- id: check-case-conflict
# Check that there are no merge conflicts (could be generated by template sync)
- id: check-merge-conflict
args: [--assume-in-merge]
- repo: https://github.com/iterative/dvc
rev: "3.51.2"

# for ipynb files in `src` directories: we never want to commit any output as rendered output files
# are tracked by dvc
- repo: https://github.com/kynan/nbstripout
rev: "0.8.1"
hooks:
- id: dvc-pre-commit
language_version: python3
stages:
- pre-commit
- id: dvc-pre-push
additional_dependencies: ["s3"]
language_version: python3
stages:
- pre-push
verbose: true
- id: dvc-post-checkout
always_run: true
language_version: python3
verbose: true
stages:
- post-checkout
- id: nbstripout

- repo: local
hooks:
- id: dso-compile-config
name: Run dso compile-config
entry: dso compile-config
language: system
stages: [pre-commit]
- id: lint
name: Run dso lint
entry: dso lint --skip-compile
entry: dso lint
language: system
stages: [pre-commit]
- id: push
name: Run dvc push
entry: dvc git-hook pre-push
require_serial: true
language: system
verbose: true
always_run: true
stages: [pre-push]

# These are hooks for automated formatting - we recommend them but don't enable them by default.

# Prettier formats Markdown, JSON, CSS, JS and others
# - repo: https://github.com/rbubley/mirrors-prettier
# rev: 'v3.4.2'
# hooks:
# - id: prettier

# Ruff formats Python files
# - repo: https://github.com/astral-sh/ruff-pre-commit
# rev: 'v0.9.4'
# hooks:
# - id: ruff
# types_or: [python, pyi, jupyter]
# args: [--fix, --exit-non-zero-on-fix]
# - id: ruff-format
# types_or: [python, pyi, jupyter]

# Styler formats R files
# - repo: https://github.com/lorenzwalthert/precommit
# rev: 'v0.4.3'
# hooks:
# - id: style-files