-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support chat transcription forwarding #99
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: ed47c1c The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
} | ||
} | ||
|
||
protected async sendChatMessage() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one issue with this approach is that we'll eventually run into datapacket size limit restrictions with very long messages.
should we segment them in the sdk or use a tokenizer of sorts after a certain length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would be fine to simply truncate at 15KB or up to 63KB and log a warning, for now. If we do decide to split into multiple messages things get messy, esp with edits. and tokenization-aware splitting may be tricky too in UTF-8 where characters have variable bytesizes...
…ts-js into lukas/transcription-as-chat
can you mark the changeset as minor and anchor this to v0.4.0 instead of main? |
} | ||
} | ||
|
||
protected async sendChatMessage() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would be fine to simply truncate at 15KB or up to 63KB and log a warning, for now. If we do decide to split into multiple messages things get messy, esp with edits. and tokenization-aware splitting may be tricky too in UTF-8 where characters have variable bytesizes...
case TranscriptionType.TRANSCRIPTION: | ||
case TranscriptionType.CHAT_AND_TRANSCRIPTION: | ||
await this.publishTranscription(false); | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this switch looks wrong to me - won't chat only still fall through to transcription? a switch doesn't seem like the right conditional type to use for a-b-or-both behavior
} | ||
} | ||
|
||
export class ChatAndTranscriptionForwarder extends BasicTranscriptionForwarder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implementing this as a subclass feels unnecessarily brittle and hard to follow - it's already slightly tricky code.
@@ -14,101 +15,108 @@ export interface TranscriptionForwarder { | |||
currentCharacterIndex: number; | |||
} | |||
|
|||
export enum TranscriptionType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ForwardingType
would be a clearer name. I'd also consider making the options Transcription
and Chat
and the setting an array, rather than a third enum value. then you can simplify the forwarding checks in the loop to
if (forwarding_types.contains('transcription')) {
...
}
if (forwarding_types.contains('chat')) {
...
}
No description provided.