Feature Request: Compatibility with databases via `dbplyr` #378

kenkomodo · 2025-01-17T18:27:17Z

What's the feature?

It would be really great to have compatibility with dbplyr summary functions so that users could easily create ARDs or generate tables via gtsummary via the cards dependency in cases where the data to be summarized are larger than memory on either their local computer or in a cloud-based RDMS.

Since dbplyr has a lot of analogous verbs compared to dplyr and a good portion of functions can be translated by dbplyr, my guess is it wouldn't be a giant lift (although I could easily be wrong). I just haven't had time to get familiar enough with the cards source code to know exactly which pieces need changed.

The text was updated successfully, but these errors were encountered:

ddsjoberg · 2025-01-17T18:33:17Z

Thanks for the post @kenkomodo ! That is an interesting point you bring up.

In dbplyr, all the calculations occur in the data base, correct? I am not sure we'd be able to generalize for that to work well?

Our simple tabulations are processed through base::table(), which would require a rather large refactor to remove. And the results from ard_continuous() are generalized to to utilize any function in R that executes on a vector, and these won't have equivalents in SQL. The last thing that comes to mind is that the cardx package implements many complex statistical methods that would also not be available in a data base.

BUT, this is not a topic I've spent much time researching! Perhaps you can correct me if my assumptions are incorrect?

kenkomodo · 2025-01-21T17:07:06Z

@ddsjoberg Yes - dbplyr uses lazy evaluation (with the exception of do()) to generate necessary SQL code and then only sends it to the database to perform the calculations when the data are requested using collapse(), compute() or collect(), which execute the query in the database then download the query results into R.

Would it still require a large refactor since the database tbl objects have a different class? Curious if defining a separate function(s) for the tbl/tbl_lazy class would still need such a large lift - deferring to your expertise on that one.

As for the functions that don't have equivalents in SQL, I think that would simply be a limitation and the an error message could be printed in the console for the user if they tried to use a function that didn't have an equivalent in SQL.

ddsjoberg · 2025-01-21T20:13:46Z

Ahh very interesting! I can certainly see how that would be useful for some key tabulations.

For now, I think that is going to fall out of the scope of cards. But if you choose to spin up a package to create DB ARDs, we're certainly here to help in the design and to explain our design choices (would be great to keep them consistent)!

ddsjoberg closed this as completed Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Compatibility with databases via `dbplyr` #378

Feature Request: Compatibility with databases via `dbplyr` #378

kenkomodo commented Jan 17, 2025

ddsjoberg commented Jan 17, 2025

kenkomodo commented Jan 21, 2025

ddsjoberg commented Jan 21, 2025

Feature Request: Compatibility with databases via dbplyr #378

Feature Request: Compatibility with databases via dbplyr #378

Comments

kenkomodo commented Jan 17, 2025

What's the feature?

ddsjoberg commented Jan 17, 2025

kenkomodo commented Jan 21, 2025

ddsjoberg commented Jan 21, 2025

Feature Request: Compatibility with databases via `dbplyr` #378

Feature Request: Compatibility with databases via `dbplyr` #378