Skip to content

Commit

Permalink
Change details on the template gitignore
Browse files Browse the repository at this point in the history
Signed-off-by: Laura Couto <[email protected]>
  • Loading branch information
lrcouto committed Feb 11, 2025
1 parent 9d8e886 commit 2ab8b5d
Showing 1 changed file with 17 additions and 7 deletions.
24 changes: 17 additions & 7 deletions docs/source/data/kedro_dvc_versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ To use DVC as a Python library, install it using `pip` or `conda`, for example:

Since DVC works alongside Git to track data changes, initialise the Kedro project as a git repository: `git init`.

Then, initialize DVC in the project: `dvc init`.
Then, initialize DVC in the project: `dvc init`. This will create the `.dvc` directory inside the project.

Check warning on line 22 in docs/source/data/kedro_dvc_versioning.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/data/kedro_dvc_versioning.md#L22

[Kedro.ukspelling] In general, use UK English spelling instead of 'initialize'.
Raw output
{"message": "[Kedro.ukspelling] In general, use UK English spelling instead of 'initialize'.", "location": {"path": "docs/source/data/kedro_dvc_versioning.md", "range": {"start": {"line": 22, "column": 7}}}, "severity": "WARNING"}

You should see a message such as:

Expand Down Expand Up @@ -51,17 +51,27 @@ companies:
filepath: data/01_raw/companies.csv
```
You will have to do some changes to the the `.gitignore` file provided by the template to allow DVC to track the dataset files. Since the spaceflights-pandas starter's `.gitignore` ignores everything under the `data/` directory by default, you will have to update it by removing the following lines from it:
Since we initialized a new Git repository with `git init` on the previous step, we can now make an initial commit containing all of the files in the project:

Check warning on line 54 in docs/source/data/kedro_dvc_versioning.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/data/kedro_dvc_versioning.md#L54

[Kedro.ukspelling] In general, use UK English spelling instead of 'initialized'.
Raw output
{"message": "[Kedro.ukspelling] In general, use UK English spelling instead of 'initialized'.", "location": {"path": "docs/source/data/kedro_dvc_versioning.md", "range": {"start": {"line": 54, "column": 10}}}, "severity": "WARNING"}

Check warning on line 54 in docs/source/data/kedro_dvc_versioning.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/data/kedro_dvc_versioning.md#L54

[Kedro.toowordy] 'all of' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'all of' is too wordy", "location": {"path": "docs/source/data/kedro_dvc_versioning.md", "range": {"start": {"line": 54, "column": 126}}}, "severity": "WARNING"}

```bash
# ignore everything in the following folders
data/**
git add .
git commit -m "First commit, initial structure from the starter"
```

# except their sub-folders
!data/**/
Because of the location of the datasets files in the project template, it will be necessary to make sure that the following line is present in the projectg's `.gitignore` file so we allow for the `.dvc` files to be commited:

Check notice on line 61 in docs/source/data/kedro_dvc_versioning.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/data/kedro_dvc_versioning.md#L61

[Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer.
Raw output
{"message": "[Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer.", "location": {"path": "docs/source/data/kedro_dvc_versioning.md", "range": {"start": {"line": 61, "column": 1}}}, "severity": "INFO"}

Check warning on line 61 in docs/source/data/kedro_dvc_versioning.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/data/kedro_dvc_versioning.md#L61

[Kedro.Spellings] Did you really mean 'projectg's'?
Raw output
{"message": "[Kedro.Spellings] Did you really mean 'projectg's'?", "location": {"path": "docs/source/data/kedro_dvc_versioning.md", "range": {"start": {"line": 61, "column": 148}}}, "severity": "WARNING"}

Check warning on line 61 in docs/source/data/kedro_dvc_versioning.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/data/kedro_dvc_versioning.md#L61

[Kedro.Spellings] Did you really mean 'commited'?
Raw output
{"message": "[Kedro.Spellings] Did you really mean 'commited'?", "location": {"path": "docs/source/data/kedro_dvc_versioning.md", "range": {"start": {"line": 61, "column": 216}}}, "severity": "WARNING"}

```bash
!*.dvc
```

Then, use `dvc add` to start tracking a dataset file:
We want to use DVC to track and version our dataset file, so we remove it from Git and commit the change:

```bash
git rm -r --cached 'data/01_raw/companies.csv'
git commit -m "Stop tracking data/01_raw/companies.csv"
```

Finally, we start tracking it with DVC:

Check warning on line 74 in docs/source/data/kedro_dvc_versioning.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/data/kedro_dvc_versioning.md#L74

[Kedro.weaselwords] 'Finally' is a weasel word!
Raw output
{"message": "[Kedro.weaselwords] 'Finally' is a weasel word!", "location": {"path": "docs/source/data/kedro_dvc_versioning.md", "range": {"start": {"line": 74, "column": 2}}}, "severity": "WARNING"}

```bash
dvc add data/01_raw/companies.csv
Expand Down

0 comments on commit 2ab8b5d

Please sign in to comment.