Skip to content

Commit

Permalink
cmd-ref: document data:status command (#3812)
Browse files Browse the repository at this point in the history
* cmd-ref: document data:status command

* Update content/docs/sidebar.json

* Apply suggestions from code review

Co-authored-by: Dave Berenbaum <[email protected]>
Co-authored-by: David de la Iglesia Castro <[email protected]>

* apply suggestions

* apply suggestions

* fix options

* rewrite description

* Apply suggestions from code review

Co-authored-by: Dave Berenbaum <[email protected]>

* Restyled by prettier (#3901)

Co-authored-by: Restyled.io <[email protected]>

* reword granular

* Restyled by prettier (#3912)

Co-authored-by: Restyled.io <[email protected]>

* fix review suggestions

* change data index.md

* data: remove index

* sync to latest dvc-data-status outputs

Co-authored-by: Jorge Orpinel <[email protected]>
Co-authored-by: Dave Berenbaum <[email protected]>
Co-authored-by: David de la Iglesia Castro <[email protected]>
Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com>
Co-authored-by: Restyled.io <[email protected]>
  • Loading branch information
6 people authored Sep 2, 2022
1 parent 5ed2617 commit 62a5c97
Show file tree
Hide file tree
Showing 2 changed files with 146 additions and 0 deletions.
135 changes: 135 additions & 0 deletions content/docs/command-reference/data/status.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# data status

Show changes in the data tracked by DVC in the workspace.

## Synopsis

```usage
usage: dvc data status [-h] [-q | -v]
[--granular] [--unchanged]
[--untracked-files [{no,all}]]
[--json]
```

## Description

The `data status` command displays the state of the working directory and the
changes with respect to the last Git commit (`HEAD`). It shows you what new
changes have been committed to DVC, which haven't been committed, which files
aren't being tracked by DVC and Git, and what files are missing from the
<abbr>cache</abbr>.

The `dvc data status` command only outputs information, it won't modify or
change anything in your working directory. It's a good practice to check the
state of your repository before doing `dvc commit` or `git commit` so that you
don't accidentally commit something you don't mean to.

An example output might look something like follows:

```dvc
$ dvc data status
Not in cache:
(use "dvc fetch <file>..." to download files)
data/data.xml
DVC committed changes:
(git commit the corresponding dvc files to update the repo)
modified: data/features/
DVC uncommitted changes:
(use "dvc commit <file>..." to track changes)
(use "dvc checkout <file>..." to discard changes)
deleted: model.pkl
(there are other changes not tracked by dvc, use "git status" to see)
```

As shown above, the `dvc data status` displays changes in multiple categories:

- _Not in cache_ indicates that the hash for files are recorded in `dvc.lock`
and `.dvc` files but the corresponding cache files are missing.
- _DVC committed changes_ indicates that there are changes that are
`dvc-commit`-ed that differs with the last Git commit. There might be more
detailed state on how each of those files changed: _added_, _modified_,
_deleted_ and _unknown_.
- _DVC uncommitted changes_ indicates that there are changes in the working
directory that are not `dvc commit`-ed yet. Same as _DVC committed changes_,
there might be more detailed state on how each of those files changed.
- _Untracked files_ shows the files that are not being tracked by DVC and Git.
This is disabled by default, unless [`--untracked-files`](#--untracked-files)
is specified.
- _DVC Unchanged files_ shows the files that are not changed. This is not shown
by default, unless [`--unchanged`](#--unchanged) is specified.

By default, `dvc data status` does not show individual changes inside the
tracked directories, which can be enabled with [`--granular`](#--granular)
option.

## Options

- `--granular` - show granular, file-level information of the changes for
DVC-tracked directories. By default, `dvc data status` does not show
individual changes for files inside the tracked directories.

- `--untracked-files` - show files that are not being tracked by DVC and Git.

- `--unchanged` - show unchanged DVC-tracked files.

- `--json` - prints the command's output in easily parsable JSON format, instead
of a human-readable output.

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output.

- `-v`, `--verbose` - displays detailed tracing information.

## Examples

```dvc
$ dvc data status
Not in cache:
(use "dvc fetch <file>..." to download files)
data/data.xml
DVC committed changes:
(git commit the corresponding dvc files to update the repo)
modified: data/features/
DVC uncommitted changes:
(use "dvc commit <file>..." to track changes)
(use "dvc checkout <file>..." to discard changes)
deleted: model.pkl
(there are other changes not tracked by dvc, use "git status" to see)
```

This shows that the `data/data.xml` is missing from the cache, `data/features/`
a directory, has changes that are being tracked by DVC but is not Git committed
yet, and a file `model.pkl` has been deleted from the workspace. The
`data/features/` directory is modified, but there is no further details to what
changed inside. The `--granular` option can provide more information on that.

## Example: Granular output

Following on from the above example, using `--granular` will show file-level
information for the changes:

```dvc
$ dvc data status --granular
Not in cache:
(use "dvc fetch <file>..." to download files)
data/data.xml
DVC committed changes:
(git commit the corresponding dvc files to update the repo)
added: data/features/foo
DVC uncommitted changes:
(use "dvc commit <file>..." to track changes)
(use "dvc checkout <file>..." to discard changes)
deleted: model.pkl
(there are other changes not tracked by dvc, use "git status" to see)
```

Now there's more information in _DVC committed changes_ regarding the changes in
`data/features`. From the output, it shows that there is a new file added to
`data/features`: `data/features/foo`.
11 changes: 11 additions & 0 deletions content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,17 @@
"label": "dag",
"slug": "dag"
},
{
"label": "data",
"slug": "data",
"source": false,
"children": [
{
"label": "data status",
"slug": "status"
}
]
},
{
"label": "destroy",
"slug": "destroy"
Expand Down

0 comments on commit 62a5c97

Please sign in to comment.