-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using streaming conformer as transducer encoder #380
Conversation
self.causal | ||
), "Causal convolution is required for streaming conformer." | ||
max_len = x.size(0) | ||
chunk_size = torch.randint(1, max_len, (1,)).item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method to choose chunk_size
here is from wenet's paper, we may tune it to find a better one in the future.
chunk_size: int = 16, | ||
left_context: int = 64, | ||
simulate_streaming: bool = False, | ||
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The streaming forword interface is emformer-like, it handles decoding states (i.e. caches) outside, which will make it easer to implement async decoding.
Cool! |
|
"4" has the changes from zengwei that supports model averaging that you proposed in #337 Otherwise, 4 is identical to 2. I would suggest to always use 4 as zengwei has showed it is helpful. From my results of deeper and narrower conformer, it is also helpful. |
@csukuangfj BTW, are we recommending the deeper and narrower conformer right now? How much better was it, again? |
@pkufool it may be fairly easy to use |
I am updating the results and uploading the pre-trained models to huggingface. Please wait for a moment. |
I think they share the same conformer.py, only a little changes on train.py & decode.py.
Thanks, I will try it. |
states=[], | ||
chunk_size=params.decode_chunk_size, | ||
left_context=params.left_context, | ||
simulate_streaming=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using simulate_streaming=False,
does it have any difference with the else
part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three kinds of decoding pipelines now: non-streaming, simulate streaming, real streaming. The non-streaming and simulate streaming code locate in decode.py
, which uses lhotse dataloader, if simulate_streaming=False
it uses non-streaming decoding, otherwise simulate streaming decoding. The real streaming decoding code locates in steaming_decode.py
, which has a DecodeStream
queue and batches sequence asynchronously。
egs/librispeech/ASR/pruned_transducer_stateless/streaming_decode.py
Outdated
Show resolved
Hide resolved
Merging now. Most of the code added is to support streaming decoding, it won't break any previous recipes. Will fix the issues later in other PRs if there is any. |
I slightly refactored #242, and create this new pull request. I think it is no need to duplicate a new folder, as making conformer to streaming just needs very minor fixes.
I also add the symbol delay penalty to this PR, see k2-fsa/k2#955 for how we penalize and measure symbol delay.Will post the experiment results here soon.