Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd-reference: document import-db #5033

Merged
merged 11 commits into from
Jan 11, 2024
Merged

cmd-reference: document import-db #5033

merged 11 commits into from
Jan 11, 2024

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Dec 5, 2023

No description provided.

@shcheklein shcheklein temporarily deployed to dvc-org-import-db-teaqlzcbolyj December 5, 2023 11:45 Inactive
Copy link
Contributor

github-actions bot commented Dec 5, 2023

Link Check Report

There were no links to check!

| **SQL Server** | `pyodbc` | `mssql+pyodbc://{username}:{password}@{hostname}:{port}/{database_name}` |
| **Trino** | `trino` | `trino://{username}:{password}@{hostname}:{port}/{catalog}` |

DVC uses sqlalchemy internally. So DVC should support any SQL databases that
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only tested with sqlite, postgres, and snowflake. But it should probably work with all mentioned here (and more).

@dberenbaum
Copy link
Contributor

@skshetry What's the status of this PR?

@shcheklein shcheklein temporarily deployed to dvc-org-import-db-teaqlzcbolyj January 2, 2024 12:31 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-import-db-teaqlzcbolyj January 2, 2024 12:49 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-import-db-teaqlzcbolyj January 2, 2024 12:50 Inactive
@shcheklein shcheklein had a problem deploying to dvc-org-import-db-teaqlzcbolyj January 2, 2024 12:53 Failure
@skshetry skshetry changed the title [WIP] cmd-reference: document import-db cmd-reference: document import-db Jan 2, 2024
@shcheklein shcheklein temporarily deployed to dvc-org-import-db-teaqlzcbolyj January 2, 2024 13:00 Inactive
@dberenbaum
Copy link
Contributor

Can we note that this is experimental and subject to breaking changes?

@dberenbaum dberenbaum self-requested a review January 2, 2024 20:58
Copy link
Contributor

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments on the doc, but it raised a bigger product question for me that I missed in iterative/dvc#10125. Unlike iterative/dvc#10040, --sql is required here, right? It feels artificial to be limited to a sql query written in the command line. Users likely either want to pull a table, which makes select * seem redundant, or the results of a more complex query.

@skshetry
Copy link
Member Author

skshetry commented Jan 3, 2024

Left some comments on the doc, but it raised a bigger product question for me that I missed in iterative/dvc#10125. Unlike iterative/dvc#10040, --sql is required here, right?

We discussed this privately around the time of iterative/dvc#10040 about snapshotting a table or a database, but at that point --sql was not the main focus. It should be easy to add a --table here.

@shcheklein shcheklein temporarily deployed to dvc-org-import-db-teaqlzcbolyj January 4, 2024 13:07 Inactive
@dberenbaum dberenbaum added the ⌛ status: wait-core-merge Waiting for related product PR merge/release label Jan 4, 2024
@dberenbaum
Copy link
Contributor

Depends on iterative/dvc#10219

@dberenbaum
Copy link
Contributor

Can we note that this is experimental and subject to breaking changes?

@skshetry Can we add something like this? Otherwise it LGTM.

@skshetry
Copy link
Member Author

skshetry commented Jan 4, 2024

@skshetry Can we add something like this? Otherwise it LGTM.

Should we consider it stable now?

@dberenbaum
Copy link
Contributor

I would rather play it safe here, especially considering the discussions in iterative/dvc#10164 and related issues that could impact this. We don't do this so often that it's hard to maintain, so I don't see much downside.

Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dberenbaum
Copy link
Contributor

https://dvc.org/doc/user-guide/project-structure/dvc-files - do we need to update some of these files?

also this https://dvc.org/doc/command-reference/update

Yes, good catch

@skshetry
Copy link
Member Author

dvc.org/doc/user-guide/project-structure/dvc-files - do we need to update some of these files?

also this dvc.org/doc/command-reference/update

Since we consider this as experimental, I'd avoid updating other files for now.

@dberenbaum
Copy link
Contributor

Since we consider this as experimental, I'd avoid updating other files for now.

That's fine with me as long as we have an issue to track it so we don't forget.

@dberenbaum
Copy link
Contributor

@skshetry Could you resolve the rest of the comments so we can merge?

@shcheklein
Copy link
Member

Since we consider this as experimental, I'd avoid updating other files for now.

could you folks share some light on this idea? what is the expectation to make a proper feature?

also, can we just put a note / admon on the top of the cmd ref pages? it would be more explicit I guess

(not a blocker - just the first time I see that we are being conservative with updates to docs and just curios about the procedure you have in mind)

@skshetry
Copy link
Member Author

could you folks share some light on this idea? what is the expectation to make a proper feature?

See #5033 (comment).

We want to see if we can fit this into datasets or datasets-like interface, since users are likely to read from their database in their scripts rather than snapshotting into a csv file.
But I don't think it is either-or here. (But snapshotting users may be smaller than streaming).

@shcheklein shcheklein temporarily deployed to dvc-org-import-db-teaqlzcbolyj January 11, 2024 05:13 Inactive

```usage
usage: dvc import-db [-h] [-q | -v]
[--sql SQL | --table TABLE] [--conn CONN]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this should be:

Suggested change
[--sql SQL | --table TABLE] [--conn CONN]
(--sql SQL | --table TABLE) --conn CONN

But auto-linking seems to be broken with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change it in the DVC itself also. We don't use this style. I mean all uppercase CONN or TABLE. Pleas check other commands.

If this (--sql SQL | --table TABLE) is correct (why?) - we should also fix first on the DVC side - and adjust auto-linker here, I hope it's not that complicated

but let's check if we do this in some other DVC commands

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are a mutually-exclusive argument, and one of them is required. I can change the casings of TABLE/SQL with metavar (uppercased name is just default, and we do those in many places when we are lazy).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed this in the docs. Will handle this in DVC separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we do those in many places when we are lazy

are there examples? I checked a few commands and did't find it.

anyways, I think we need to update it on the DVC side for all the commands

They are a mutually-exclusive argument, and one of them is required.

good point, then let's change on the DVC side first and we can fix the website later ... DVC is more important I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usage: dvc experiments run [-h] [-q | -v] [-f] [-i] [-s] [-p] [-P] [-R] [--downstream] [--force-downstream] [--pull]
                           [--allow-missing] [--dry] [-k] [--ignore-errors] [-n <name>] [-S [<filename>:]<param_name>=<param_value>]
                           [--queue] [--run-all] [-j <number>] [--temp] [-C COPY_PATHS] [-m MESSAGE]
                           [targets ...]

See [-m MESSAGE] and [-c COPY_PATHS]. It's just the default from argparse, so it's easy to miss here. There are other commands like du/artifacts get/exp save/fetch/ls-url/ls/get/import, etc that does this. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this in dvc by iterative/dvc#10226.

@skshetry skshetry mentioned this pull request Jan 11, 2024
@skshetry
Copy link
Member Author

That's fine with me as long as we have an issue to track it so we don't forget.

Created #5087.

@shcheklein shcheklein temporarily deployed to dvc-org-import-db-teaqlzcbolyj January 11, 2024 05:26 Inactive
Copy link
Contributor

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @skshetry! LGTM.

@dberenbaum dberenbaum merged commit 9a72c90 into main Jan 11, 2024
5 checks passed
@dberenbaum dberenbaum deleted the import-db branch January 11, 2024 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⌛ status: wait-core-merge Waiting for related product PR merge/release
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants