-
Notifications
You must be signed in to change notification settings - Fork 35
/
README.t
115 lines (93 loc) · 3.76 KB
/
README.t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# Speech Transformer
## Introduction
This is a PyTorch re-implementation of Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition.
## Dataset
Aishell is an open-source Chinese Mandarin speech corpus published by Beijing Shell Shell Technology Co.,Ltd.
400 people from different accent areas in China are invited to participate in the recording, which is conducted in a quiet indoor environment using high fidelity microphone and downsampled to 16kHz. The manual transcription accuracy is above 95%, through professional speech annotation and strict quality inspection. The data is free for academic use. We hope to provide moderate amount of data for new researchers in the field of speech recognition.
```
@inproceedings{aishell_2017,
title={AIShell-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline},
author={Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, Hao Zheng},
booktitle={Oriental COCOSDA 2017},
pages={Submitted},
year={2017}
}
```
In data folder, download speech data and transcripts:
```bash
$ wget http://www.openslr.org/resources/33/data_aishell.tgz
```
## Performance
Evaluate with 7176 audios in Aishell test set:
```bash
$ python test.py
```
## Results
|Model|CER|Download|
|---|---|---|
|Speech Transformer|11.5|[Link](https://github.com/foamliu/Speech-Transformer/releases/download/v1.0/BEST_checkpoint.tar)|
## Dependency
- Python 3.6.8
- PyTorch 1.3.0
## Usage
### Data Pre-processing
Extract data_aishell.tgz:
```bash
$ python extract.py
```
Extract wav files into train/dev/test folders:
```bash
$ cd data/data_aishell/wav
$ find . -name '*.tar.gz' -execdir tar -xzvf '{}' \;
```
Scan transcript data, generate features:
```bash
$ python pre_process.py
```
Now the folder structure under data folder is sth. like:
<pre>
data/
data_aishell.tgz
data_aishell/
transcript/
aishell_transcript_v0.8.txt
wav/
train/
dev/
test/
aishell.pickle
</pre>
### Train
```bash
$ python train.py
```
If you want to visualize during training, run in your terminal:
```bash
$ tensorboard --logdir runs
```
### Demo
Please download the [pretrained model](https://github.com/foamliu/Speech-Transformer/releases/download/v1.0/speech-transformer-cn.pt) then run:
```bash
$ python demo.py
```
It picks 10 random test examples and recognize them like these:
|Audio|Out|GT|
|---|---|---|
|[audio_0.wav](https://github.com/foamliu/Speech-Transformer/raw/master/audios/audio_0.wav)|$(out_list_0)|$(gt_0)|
|[audio_1.wav](https://github.com/foamliu/Speech-Transformer/raw/master/audios/audio_1.wav)|$(out_list_1)|$(gt_1)|
|[audio_2.wav](https://github.com/foamliu/Speech-Transformer/raw/master/audios/audio_2.wav)|$(out_list_2)|$(gt_2)|
|[audio_3.wav](https://github.com/foamliu/Speech-Transformer/raw/master/audios/audio_3.wav)|$(out_list_3)|$(gt_3)|
|[audio_4.wav](https://github.com/foamliu/Speech-Transformer/raw/master/audios/audio_4.wav)|$(out_list_4)|$(gt_4)|
|[audio_5.wav](https://github.com/foamliu/Speech-Transformer/raw/master/audios/audio_5.wav)|$(out_list_5)|$(gt_5)|
|[audio_6.wav](https://github.com/foamliu/Speech-Transformer/raw/master/audios/audio_6.wav)|$(out_list_6)|$(gt_6)|
|[audio_7.wav](https://github.com/foamliu/Speech-Transformer/raw/master/audios/audio_7.wav)|$(out_list_7)|$(gt_7)|
|[audio_8.wav](https://github.com/foamliu/Speech-Transformer/raw/master/audios/audio_8.wav)|$(out_list_8)|$(gt_8)|
|[audio_9.wav](https://github.com/foamliu/Speech-Transformer/raw/master/audios/audio_9.wav)|$(out_list_9)|$(gt_9)|
## 小小的赞助~
<p align="center">
<img src="https://github.com/foamliu/Speech-Transformer/blob/master/sponsor.jpg" alt="Sample" width="324" height="504">
<p align="center">
<em>若对您有帮助可给予小小的赞助~</em>
</p>
</p>
<br/><br/><br/>