Add a File Splitter and a Token Counter #92

Iory1998 · 2024-10-02T17:53:50Z

Iory1998
Oct 2, 2024

Hi,

While existing tools excel at combining files, a complementary tool for splitting text into token chunks would be incredibly valuable. Imagine having a 100K token text needing summarization with an LLM – processing the whole chunk might be inefficient. Splitting it into manageable segments, like 10K tokens, is ideal. Here's how this could be implemented:

Simple Splitting: A straightforward approach would divide the text based on a user-defined chunk size (e.g., 10K tokens). However, this risks disrupting semantic meaning due to arbitrary cuts.
LLM-Guided Splitting: Leveraging an LLM's comprehension, we could train it to split the text within the specified chunk limit, ensuring each segment retains coherence for effective summarization.

Furthermore, a Token Counter node is crucial. It should track both input and output tokens, ideally integrated with the Show Text Node. This real-time token count would empower users to monitor token usage throughout the process.

heshengtao · 2024-10-04T09:32:35Z

heshengtao
Oct 4, 2024
Maintainer

I have a question about this. If LLM can be used for segmentation, it means that the LLM context can input so many characters, so there is no need to segment. So LLM cannot be used for segmentation tasks. There are two nodes in the party, one is called text_iterator, and the other is called Split text into JSON. The former can safely divide the text into multiple paragraphs, and then only one of them will be returned each time it is executed, and the output will be iterated. The latter will divide the characters according to your definition. The default is newline characters, and a large piece of text will be divided into a json dictionary.

The token counter should be easy to implement, but it will change my core node. It's not that I can't write it, I'm just afraid that my update will invalidate the workflow for all users. After I confirm that it is safe, I will change it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a File Splitter and a Token Counter #92

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Add a File Splitter and a Token Counter #92

Iory1998 Oct 2, 2024

Replies: 1 comment

heshengtao Oct 4, 2024 Maintainer

Iory1998
Oct 2, 2024

heshengtao
Oct 4, 2024
Maintainer