Replies: 1 comment
-
I have a question about this. If LLM can be used for segmentation, it means that the LLM context can input so many characters, so there is no need to segment. So LLM cannot be used for segmentation tasks. There are two nodes in the party, one is called text_iterator, and the other is called Split text into JSON. The former can safely divide the text into multiple paragraphs, and then only one of them will be returned each time it is executed, and the output will be iterated. The latter will divide the characters according to your definition. The default is newline characters, and a large piece of text will be divided into a json dictionary. The token counter should be easy to implement, but it will change my core node. It's not that I can't write it, I'm just afraid that my update will invalidate the workflow for all users. After I confirm that it is safe, I will change it. |
Beta Was this translation helpful? Give feedback.
-
Hi,
While existing tools excel at combining files, a complementary tool for splitting text into token chunks would be incredibly valuable. Imagine having a 100K token text needing summarization with an LLM – processing the whole chunk might be inefficient. Splitting it into manageable segments, like 10K tokens, is ideal. Here's how this could be implemented:
Simple Splitting: A straightforward approach would divide the text based on a user-defined chunk size (e.g., 10K tokens). However, this risks disrupting semantic meaning due to arbitrary cuts.
LLM-Guided Splitting: Leveraging an LLM's comprehension, we could train it to split the text within the specified chunk limit, ensuring each segment retains coherence for effective summarization.
Furthermore, a Token Counter node is crucial. It should track both input and output tokens, ideally integrated with the Show Text Node. This real-time token count would empower users to monitor token usage throughout the process.
Beta Was this translation helpful? Give feedback.
All reactions