Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [MIN-85] [Experiment] Intelligent Chunking #65

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

d-lowl
Copy link
Member

@d-lowl d-lowl commented Aug 17, 2023

This PR implements:

  • Necessary steps for preparation pipeline
  • Some tests for those steps, to ensure that these steps perform
  • Some adjustments to the cleaning step and the pipeline itself (potentially, not necessary, need investigating)

The intelligent chunking is done be splitting the documents in sections (the "regular" chunking is then performed, when the documents are still to large)

The discussion and conclusions can be found in the lab notes, but the main conclusion is that it does not improve results. Hence, the branch will probably not need to be merged. Here for visibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant