Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Compatibility with databases via dbplyr #378

Closed
kenkomodo opened this issue Jan 17, 2025 · 3 comments
Closed

Feature Request: Compatibility with databases via dbplyr #378

kenkomodo opened this issue Jan 17, 2025 · 3 comments

Comments

@kenkomodo
Copy link

What's the feature?

It would be really great to have compatibility with dbplyr summary functions so that users could easily create ARDs or generate tables via gtsummary via the cards dependency in cases where the data to be summarized are larger than memory on either their local computer or in a cloud-based RDMS.

Since dbplyr has a lot of analogous verbs compared to dplyr and a good portion of functions can be translated by dbplyr, my guess is it wouldn't be a giant lift (although I could easily be wrong). I just haven't had time to get familiar enough with the cards source code to know exactly which pieces need changed.

@ddsjoberg
Copy link
Collaborator

Thanks for the post @kenkomodo ! That is an interesting point you bring up.

In dbplyr, all the calculations occur in the data base, correct? I am not sure we'd be able to generalize for that to work well?

Our simple tabulations are processed through base::table(), which would require a rather large refactor to remove. And the results from ard_continuous() are generalized to to utilize any function in R that executes on a vector, and these won't have equivalents in SQL. The last thing that comes to mind is that the cardx package implements many complex statistical methods that would also not be available in a data base.

BUT, this is not a topic I've spent much time researching! Perhaps you can correct me if my assumptions are incorrect?

@kenkomodo
Copy link
Author

@ddsjoberg Yes - dbplyr uses lazy evaluation (with the exception of do()) to generate necessary SQL code and then only sends it to the database to perform the calculations when the data are requested using collapse(), compute() or collect(), which execute the query in the database then download the query results into R.

Would it still require a large refactor since the database tbl objects have a different class? Curious if defining a separate function(s) for the tbl/tbl_lazy class would still need such a large lift - deferring to your expertise on that one.

As for the functions that don't have equivalents in SQL, I think that would simply be a limitation and the an error message could be printed in the console for the user if they tried to use a function that didn't have an equivalent in SQL.

@ddsjoberg
Copy link
Collaborator

Ahh very interesting! I can certainly see how that would be useful for some key tabulations.

For now, I think that is going to fall out of the scope of cards. But if you choose to spin up a package to create DB ARDs, we're certainly here to help in the design and to explain our design choices (would be great to keep them consistent)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants