Automatically batch texts when too long #6

dennlinger · 2021-12-14T13:35:53Z

For samples that exceed the 512 subword token limit, we currently do not have a strategy in place to deal with this.
This is both unwanted and relatively easy to improve. There are a few considerations with respect to the exact strategy to be used, but it seems like a good starting point to approximate sentences with something like a lightweight spacy model, and then chunk based on approximate max length.

dennlinger self-assigned this Dec 14, 2021

dennlinger added bug Something isn't working enhancement New feature or request labels Dec 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically batch texts when too long #6

Automatically batch texts when too long #6

dennlinger commented Dec 14, 2021 •

edited

Loading

Automatically batch texts when too long #6

Automatically batch texts when too long #6

Comments

dennlinger commented Dec 14, 2021 • edited Loading

dennlinger commented Dec 14, 2021 •

edited

Loading