Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using streaming conformer as transducer encoder #380

Merged
merged 37 commits into from
Jun 27, 2022

Conversation

pkufool
Copy link
Collaborator

@pkufool pkufool commented May 22, 2022

I slightly refactored #242, and create this new pull request. I think it is no need to duplicate a new folder, as making conformer to streaming just needs very minor fixes.

I also add the symbol delay penalty to this PR, see k2-fsa/k2#955 for how we penalize and measure symbol delay.

Will post the experiment results here soon.

self.causal
), "Causal convolution is required for streaming conformer."
max_len = x.size(0)
chunk_size = torch.randint(1, max_len, (1,)).item()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method to choose chunk_size here is from wenet's paper, we may tune it to find a better one in the future.

chunk_size: int = 16,
left_context: int = 64,
simulate_streaming: bool = False,
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The streaming forword interface is emformer-like, it handles decoding states (i.e. caches) outside, which will make it easer to implement async decoding.

@danpovey
Copy link
Collaborator

Cool!
One thing we should perhaps discuss is whether this belongs as a change to pruned_transducer_stateless2, vs. pruned_transducer_stateless4 which I believe is the latest recipe? @csukuangfj what are the changes from 2 to 4?

@yaozengwei
Copy link
Collaborator

Cool! One thing we should perhaps discuss is whether this belongs as a change to pruned_transducer_stateless2, vs. pruned_transducer_stateless4 which I believe is the latest recipe? @csukuangfj what are the changes from 2 to 4?

pruned_transducer_stateless4 (see #344) supports saving averaged model and the epoch number counts from 1.

@csukuangfj
Copy link
Collaborator

@csukuangfj what are the changes from 2 to 4?

"4" has the changes from zengwei that supports model averaging that you proposed in #337

Otherwise, 4 is identical to 2. I would suggest to always use 4 as zengwei has showed it is helpful.

From my results of deeper and narrower conformer, it is also helpful.

@danpovey
Copy link
Collaborator

@csukuangfj BTW, are we recommending the deeper and narrower conformer right now? How much better was it, again?

@danpovey
Copy link
Collaborator

@pkufool it may be fairly easy to use git diff to move your changes from 2 to 4. E.g.
cd pruned_transducer_stateless2
git diff master..HEAD . > ~/diff
cd ../pruned_transducer_stateless4
patch -p4 < ~/diff
(the 4 may need to be changed, e.g. to 3 or 5, if this fails).

@csukuangfj
Copy link
Collaborator

@csukuangfj BTW, are we recommending the deeper and narrower conformer right now? How much better was it, again?

I am updating the results and uploading the pre-trained models to huggingface. Please wait for a moment.

@pkufool
Copy link
Collaborator Author

pkufool commented May 23, 2022

Cool! One thing we should perhaps discuss is whether this belongs as a change to pruned_transducer_stateless2, vs. pruned_transducer_stateless4 which I believe is the latest recipe? @csukuangfj what are the changes from 2 to 4?

I think they share the same conformer.py, only a little changes on train.py & decode.py.

@pkufool it may be fairly easy to use git diff to move your changes from 2 to 4. E.g.
cd pruned_transducer_stateless2
git diff master..HEAD . > ~/diff
cd ../pruned_transducer_stateless4
patch -p4 < ~/diff
(the 4 may need to be changed, e.g. to 3 or 5, if this fails).

Thanks, I will try it.

@pkufool pkufool marked this pull request as draft June 2, 2022 06:19
@pkufool pkufool requested review from yaozengwei and csukuangfj June 21, 2022 01:28
states=[],
chunk_size=params.decode_chunk_size,
left_context=params.left_context,
simulate_streaming=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using simulate_streaming=False, does it have any difference with the else part?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three kinds of decoding pipelines now: non-streaming, simulate streaming, real streaming. The non-streaming and simulate streaming code locate in decode.py, which uses lhotse dataloader, if simulate_streaming=False it uses non-streaming decoding, otherwise simulate streaming decoding. The real streaming decoding code locates in steaming_decode.py, which has a DecodeStream queue and batches sequence asynchronously。

@pkufool pkufool added the ready label Jun 27, 2022
@pkufool pkufool added ready and removed ready labels Jun 27, 2022
@pkufool pkufool added run-decode and removed ready labels Jun 27, 2022
@pkufool
Copy link
Collaborator Author

pkufool commented Jun 27, 2022

Merging now. Most of the code added is to support streaming decoding, it won't break any previous recipes. Will fix the issues later in other PRs if there is any.

@pkufool pkufool merged commit 6e609c6 into k2-fsa:master Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants