-
Notifications
You must be signed in to change notification settings - Fork 15
/
Copy pathREADME
166 lines (136 loc) · 7.19 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
Lens-specific notes:
The code you need to run the character detector is in `./lens/` and
`./trainDetector/train_detector.m`
To train and test the character detector on pill images, you first
need to generate the train and test data. Here's how:
1. Use draw_imprints.py to draw the bounding boxes of each character
2. Use crop_positive_chars.py and crop_negative_chars.py to generate
the 32x32px grayscale patches.
3. Copy the directories containing the positive and negaive char
patches to `./lens/data`. Update the POSITIVE_CHAR_PATCHES_DIR
and NEGATIVE_CHAR_PATCHES_DIR flags wherever they exist in
`./lens`.
4. Run create_positive_char_mats.m and create_train_and_test_mats.m.
5. In `./trainDetector/train_detector.m`, set USE_EXISTING_CLUSTERS
to 0 and set USE_LENS_DATA to 1. Make sure the filenames defined
at the top of this file are correct. Then run this script. A
report of train and test accuracy will be printed.
To run the character detector over pill images, run
`./lens/detect_chars_in_dir.m`, and then visualize the predictions
with show_detected_char_boxes.py,
---------------------------------------------------------------------
This is a demo package of the code we used for our paper, "End-to-End
Text Recognition with Convolutional Neural Networks", T. Wang, D. Wu,
A. Coates, A. Ng, in ICPR 2012.
Feel free to contact me if you have any questions.
My email address: [email protected]
My website: http://cs.stanford.edu/people/twangcat/
The package is provided to facilitate understanding of the algorithms
used in our paper. It contains demos to all the major components of our
end-to-end system. Because of speed considerations, the CNN
architectures as well as dataset sizes provided in this package are
much smaller than the actural ones used in the paper, although the
algorithms themselves are identical. We also provide our trained models
together with code to reproduce major results reported in our paper.
Below are instructions about running the demo for each component, as
well as the end-to-end system. All code run under Matlab.
=======================================================================
1. Unsupervised Feature Learning using K-Means
=======================================================================
Download the toy cropped character datasets from:
http://cs.stanford.edu/people/twangcat/ICPR2012_code/charDatasets.tar
and extract the 'dataset' folder under SceneTextCNN_demo/
>> cd kmeans/
>> kmeansDemo;
This will crop a set of random 8x8 patches, and train a set of first
layer filters using K-Means. Filters with small variances are removed.
The final resulting filters can be visualized with display_network() as
shown in the script.
=======================================================================
2. Training a toy character detector module
=======================================================================
go to the top level directory of SceneTextCNN_demo, then do:
>> cd trainDetector
>> train_detector;
This will train a character detector starting from raw data. The
pipeline is:
1. learn 1st layer filters using K-Means from raw data
2. Extract first layer feature responses from raw data
3. Train detector using backpropagation.
We get the following training and validation results on the toy dataset
train accuracy = 0.974667, validation accuracy = 0.965867
Do note that training a 2-layer CNN is not a convex problem, so you
may get slightly different numbers.
=======================================================================
3. Training a toy character classifier module
=======================================================================
go to the top level directory of SceneTextCNN_demo, then do:
>> cd trainClassifier
>> train_classifier;
This will train a character classifier starting from raw data. The
pipeline is:
1. learn 1st layer filters using K-Means from raw data
2. Extract first layer feature responses from raw data
3. Train detector using backpropagation.
We get the following training and validation results on the toy dataset
train accuracy = 1.0000, validation accuracy = 0.7699
You may get slightly different numbers due to different random initial
point
=======================================================================
4. Detecting and visualize horizontal text lines on a sample image
=======================================================================
go to the top level directory of SceneTextCNN_demo, then do:
>> cd detectorDemo
>> runDetectorDemo;
This might take quite long, mainly due to the multilayer sliding window
process. Eventually the image should be marked with line-level bounding
boxes with candidate space locations.
=======================================================================
5. Producing and visualizing end-to-end results on a sample image
=======================================================================
Download the full ICDAR Test dataset from:
http://algoval.essex.ac.uk/icdar/datasets/TrialTest/scene.zip
and extract the SceneTrailTest folder under SceneTextCNN_demo/
go to the top level directory of SceneTextCNN_demo, then do:
>> e2e_visualize;
This might take quite long, mainly due to the multilayer sliding window
process. Eventually the image should be marked with word bounding
boxes, and their labels and positions will be printed in Matlab
=======================================================================
6. Reproducing cropped word recognition results
=======================================================================
a. For the ICDAR-WD-FULL benchmark:
Download the test set of ICDAR 2003 Robust Word Recognition Dataset
from: http://algoval.essex.ac.uk/icdar/datasets/TrialTest/word.zip
Extract the 'word' folder under the top level directory of
SceneTextCNN_demo, then do:
>> wordRecog;
b. For the ICDAR-WD-50 benchmark:
We have included the lexicons used by Kai Wang in his SVT dataset paper
under: icdarTestLex/
Simply change the 'lextype' variable in wordRecog.m to '50', then do:
>> wordRecog;
c. For the SVT word recognition benchmark:
Download the SVT Dataset from:
http://vision.ucsd.edu/~kai/svt/svt.zip and extract the folder svt1
under the top level directory of SceneTextCNN_demo. Change the
'benchMark' variable in wordRecog.m to 'svt', then do:
>> wordRecog;
=======================================================================
7. Reproducing full end-to-end resutls
=======================================================================
Download and extract the ICDAR and SVT dataset as described in steps 5
and 6
Download the precomputed line-level bounding boxes from:
http://cs.stanford.edu/people/twangcat/ICPR2012_code/lineBboxes.tar
and extract the precomputedLineBboxes folder under SceneTextCNN_demo/
It contains the line-level bounding box information extracted using
our best detector model. Computing them from scratch takes a long time,
so we have provided them for you. You can also use the
detectorDemo/runDetectorFull.m script to regenerate them on your own.
Just remember to create an empty precomputedLineBboxes/ directory.
run command:
>> e2e_demo;
It generates end-to-end ICDAR-FULL and SVT results reported in our
paper. To use different lexicon sizes for ICDAR, change the 'lextype'
variable in e2e_demo.m accordingly.