Using streaming conformer as transducer encoder #380

pkufool · 2022-05-22T16:18:09Z

I slightly refactored #242, and create this new pull request. I think it is no need to duplicate a new folder, as making conformer to streaming just needs very minor fixes.

~~I also add the symbol delay penalty to this PR, see k2-fsa/k2#955 for how we penalize and measure symbol delay.~~

Will post the experiment results here soon.

… fixes for decode states

pkufool · 2022-05-22T16:24:16Z

egs/librispeech/ASR/pruned_transducer_stateless2/conformer.py

+                self.causal
+            ), "Causal convolution is required for streaming conformer."
+            max_len = x.size(0)
+            chunk_size = torch.randint(1, max_len, (1,)).item()


The method to choose chunk_size here is from wenet's paper, we may tune it to find a better one in the future.

pkufool · 2022-05-22T16:28:00Z

egs/librispeech/ASR/pruned_transducer_stateless2/conformer.py

+        chunk_size: int = 16,
+        left_context: int = 64,
+        simulate_streaming: bool = False,
+    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:


The streaming forword interface is emformer-like, it handles decoding states (i.e. caches) outside, which will make it easer to implement async decoding.

danpovey · 2022-05-23T02:44:02Z

Cool!
One thing we should perhaps discuss is whether this belongs as a change to pruned_transducer_stateless2, vs. pruned_transducer_stateless4 which I believe is the latest recipe? @csukuangfj what are the changes from 2 to 4?

yaozengwei · 2022-05-23T02:48:52Z

Cool! One thing we should perhaps discuss is whether this belongs as a change to pruned_transducer_stateless2, vs. pruned_transducer_stateless4 which I believe is the latest recipe? @csukuangfj what are the changes from 2 to 4?

pruned_transducer_stateless4 (see #344) supports saving averaged model and the epoch number counts from 1.

csukuangfj · 2022-05-23T02:50:38Z

@csukuangfj what are the changes from 2 to 4?

"4" has the changes from zengwei that supports model averaging that you proposed in #337

Otherwise, 4 is identical to 2. I would suggest to always use 4 as zengwei has showed it is helpful.

From my results of deeper and narrower conformer, it is also helpful.

danpovey · 2022-05-23T03:01:59Z

@csukuangfj BTW, are we recommending the deeper and narrower conformer right now? How much better was it, again?

danpovey · 2022-05-23T03:06:58Z

@pkufool it may be fairly easy to use git diff to move your changes from 2 to 4. E.g.
cd pruned_transducer_stateless2
git diff master..HEAD . > ~/diff
cd ../pruned_transducer_stateless4
patch -p4 < ~/diff
(the 4 may need to be changed, e.g. to 3 or 5, if this fails).

csukuangfj · 2022-05-23T03:11:52Z

@csukuangfj BTW, are we recommending the deeper and narrower conformer right now? How much better was it, again?

I am updating the results and uploading the pre-trained models to huggingface. Please wait for a moment.

pkufool · 2022-05-23T05:24:00Z

Cool! One thing we should perhaps discuss is whether this belongs as a change to pruned_transducer_stateless2, vs. pruned_transducer_stateless4 which I believe is the latest recipe? @csukuangfj what are the changes from 2 to 4?

I think they share the same conformer.py, only a little changes on train.py & decode.py.

@pkufool it may be fairly easy to use git diff to move your changes from 2 to 4. E.g.
cd pruned_transducer_stateless2
git diff master..HEAD . > ~/diff
cd ../pruned_transducer_stateless4
patch -p4 < ~/diff
(the 4 may need to be changed, e.g. to 3 or 5, if this fails).

Thanks, I will try it.

…saving

egs/librispeech/ASR/pruned_transducer_stateless/decode.py

csukuangfj · 2022-06-22T07:47:38Z

egs/librispeech/ASR/pruned_transducer_stateless/decode.py

+            states=[],
+            chunk_size=params.decode_chunk_size,
+            left_context=params.left_context,
+            simulate_streaming=True,


When using simulate_streaming=False, does it have any difference with the else part?

There are three kinds of decoding pipelines now: non-streaming, simulate streaming, real streaming. The non-streaming and simulate streaming code locate in decode.py, which uses lhotse dataloader, if simulate_streaming=False it uses non-streaming decoding, otherwise simulate streaming decoding. The real streaming decoding code locates in steaming_decode.py, which has a DecodeStream queue and batches sequence asynchronously。

egs/librispeech/ASR/pruned_transducer_stateless2/train.py

egs/librispeech/ASR/pruned_transducer_stateless/streaming_decode.py

pkufool · 2022-06-27T16:16:39Z

Merging now. Most of the code added is to support streaming decoding, it won't break any previous recipes. Will fix the issues later in other PRs if there is any.

pkufool added 3 commits May 18, 2022 23:26

support streaming in conformer

5bd2490

Add more documents

118b094

support streaming on pruned_transducer_stateless2; add delay penalty;…

7cc697c

… fixes for decode states

pkufool commented May 22, 2022

View reviewed changes

pkufool added 10 commits May 26, 2022 10:00

Minor fixes

e923b1b

streaming for pruned_transducer_stateless4

4f2ef23

Merge branch 'master' into streaming-conformer

364bccb

Fix conv cache error, support async streaming decoding

b23db42

Fix style

fb3f3d2

Fix style

aecaecf

Fix style

605838d

Add torch.jit.export

0325e3a

mask the initial cache

9629be1

Cutting off invalid frames of encoder_embed output

fc54a99

pkufool marked this pull request as draft June 2, 2022 06:19

pkufool added 7 commits June 6, 2022 06:46

fix relative positional encoding in streaming decoding for compution …

3aacf75

…saving

Minor fixes

09b0c54

Minor fixes

1c794e3

Minor fixes

d7be9bd

Minor fixes

aebe9c2

Minor fixes

85ddfd9

Merge remote-tracking branch 'origin/master' into streaming-conformer

413ca2e

pkufool requested review from yaozengwei and csukuangfj June 21, 2022 01:28

csukuangfj reviewed Jun 22, 2022

View reviewed changes

egs/librispeech/ASR/pruned_transducer_stateless/decode.py Outdated Show resolved Hide resolved

csukuangfj reviewed Jun 22, 2022

View reviewed changes

egs/librispeech/ASR/pruned_transducer_stateless/decode.py Outdated Show resolved Hide resolved

egs/librispeech/ASR/pruned_transducer_stateless/decode.py Show resolved Hide resolved

csukuangfj reviewed Jun 22, 2022

View reviewed changes

pkufool added 5 commits June 25, 2022 09:39

move model parameters to train.py

934f25f

Merge branch 'master' into streaming-conformer

61f3c87

make states in forward streaming optional

8d37175

update pretrain to support streaming model

af80a46

update results.md

59b6be5

luomingshuang reviewed Jun 27, 2022

View reviewed changes

egs/librispeech/ASR/pruned_transducer_stateless2/train.py Outdated Show resolved Hide resolved

pkufool added 2 commits June 27, 2022 19:26

update tensorboard and pre-models

2e5673f

fix typo

5bdf0e2

pkufool added the ready label Jun 27, 2022

luomingshuang reviewed Jun 27, 2022

View reviewed changes

egs/librispeech/ASR/pruned_transducer_stateless/streaming_decode.py Outdated Show resolved Hide resolved

Fix tests

90bc61e

pkufool added ready and removed ready labels Jun 27, 2022

pkufool added 2 commits June 27, 2022 20:09

remove unused arguments

0a1080e

add streaming decoding ci

81eaf20

pkufool added run-decode and removed ready labels Jun 27, 2022

pkufool added 2 commits June 27, 2022 21:05

Minor fix

8d521db

Minor fix

007e8e7

pkufool added the ready label Jun 27, 2022

luomingshuang mentioned this pull request Jun 27, 2022

[WIP] Pruned transducer stateless2 with streaming conformer for WenetSpeech #449

Closed

disable right context by default

6e15b3c

pkufool merged commit 6e609c6 into k2-fsa:master Jun 27, 2022

pkufool mentioned this pull request Jul 18, 2022

Offline/Online (standalone) ESPnet2 Transducer espnet/espnet#4479

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using streaming conformer as transducer encoder #380

Using streaming conformer as transducer encoder #380

pkufool commented May 22, 2022 •

edited

Loading

pkufool May 22, 2022

pkufool May 22, 2022

danpovey commented May 23, 2022

yaozengwei commented May 23, 2022

csukuangfj commented May 23, 2022

danpovey commented May 23, 2022

danpovey commented May 23, 2022

csukuangfj commented May 23, 2022

pkufool commented May 23, 2022

csukuangfj Jun 22, 2022

pkufool Jun 27, 2022

pkufool commented Jun 27, 2022

Using streaming conformer as transducer encoder #380

Using streaming conformer as transducer encoder #380

Conversation

pkufool commented May 22, 2022 • edited Loading

pkufool May 22, 2022

Choose a reason for hiding this comment

pkufool May 22, 2022

Choose a reason for hiding this comment

danpovey commented May 23, 2022

yaozengwei commented May 23, 2022

csukuangfj commented May 23, 2022

danpovey commented May 23, 2022

danpovey commented May 23, 2022

csukuangfj commented May 23, 2022

pkufool commented May 23, 2022

csukuangfj Jun 22, 2022

Choose a reason for hiding this comment

pkufool Jun 27, 2022

Choose a reason for hiding this comment

pkufool commented Jun 27, 2022

pkufool commented May 22, 2022 •

edited

Loading