-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dbt-set-similarity #342
Add dbt-set-similarity #342
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congrats on your 2nd dbt package @Matts52
Cool package of set similarity metrics 🤩
Note: Currently, the only GitHub tag for this repo is release
. You'll need to add a tag with a semantic version similar to like you have in dbt-ml-inline-preprocessing
. Your new package won't show up in the dbt Package Hub until about an hour or so after you add that tag.
@Matts52 Side note: I didn't check one way or the other, but if you implement a macro for the Tversky index, then you might be able to re-use it in the implementations for Jaccard and Dice since it is a generalization of each. |
Nice idea for abstraction @dbeatty10 ! I think computationally it may be a bit more burdensome to call that Tversky macro from within the Jaccard/Dice macros vs the single piece denominator found in the explicit Jaccard and Dice formulas. I'll do a little testing with larger testing data. Nonetheless will implement the Tversky in any case and fix up that release issue before merging |
Alright those changes have been made, should be good to merge! |
Merged, deployed, and available on the dbt Package Hub: https://hub.getdbt.com/Matts52/ |
Description
This package provides common methods for measuring the similarity of two unordered distinct item sets represented as array/variant type data.
Currently, the following similarity measures are supported:
Sequence/vector similarity is currently considered out of scope for this package but may be considered in future either in this package or as part of a separate package.
Link to your package's repository: https://github.com/Matts52/dbt-set-similarity
Checklist
This checklist is a cut down version of the best practices that we have identified as the package hub has grown. Although meeting these checklist items is not a prerequisite to being added to the Hub, we have found that packages which don't conform provide a worse user experience.
First run experience
Customisability
Packages for data transformation (delete if not relevant):
Dependencies
Dependencies on dbt Core
require-dbt-version
range indbt_project.yml
. Example: A package which depends on functionality added in dbt Core 1.2 should set itsrequire-dbt-version
property to[">=1.2.0", "<2.0.0"]
.Dependencies on other packages defined in packages.yml:
Interoperability
{{ dbt.except() }}
and{{ dbt.type_string() }}
.users
.Versioning