Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider an absolute # characters changed threshold for textual changes #3

Open
Mr0grog opened this issue Mar 18, 2020 · 1 comment
Labels
enhancement New feature or request

Comments

@Mr0grog
Copy link
Member

Mr0grog commented Mar 18, 2020

We should consider factoring in the absolute number of changed characters or words into the how textual changes contribute to priority. In extremely large pages, even a large change (which is worth looking at) can seem small percentage-wise. For example, only 1.1% of the text here changed, but that’s still 1,785 characters!

https://monitoring.envirodatagov.org/page/6767f063-29f7-4c50-93d0-b851d0292c98/4da08f36-ab67-463d-8517-cf191857dc02..0eae6081-9fac-4f00-b914-f19c0218e7fe

Currently, we only look at the percentage changed:

if text_analysis['diff_count'] > 0:
priority += 0.1 + 0.3 * priority_factor(text_analysis['percent_changed'])

@Mr0grog Mr0grog added the enhancement New feature or request label Mar 18, 2020
@Mr0grog
Copy link
Member Author

Mr0grog commented Mar 18, 2020

Maybe the easiest way to do this is to put a ceiling on how many characters of a page we’ll consider, e.g. pretend a page can never be longer than 5,000 (?) characters. That way, this example change above would have equated to 35.7% changed rather than 1.1% changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant