-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathFoLiA-alto.1
112 lines (90 loc) · 1.68 KB
/
FoLiA-alto.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
.TH FoLiA-alto 1 "2021 jan 19"
.SH NAME
FoLiA-alto - retrieve .alto files as specified in mpeg21.xml DIDL files.
Then extract articles or books from them.
.SH SYNOPSIS
FoLiA-alto [options] FILE
FoLiA-alto [options] DIR
.SH DESCRIPTION
When a DIR is provided,
.B FoLiA-alto
will process all DIDL files in DIR and store its results in the working
directory or in the directory specified with
.B -O
When a FILE is provided,
.B FoLiA-alto
will process that DIDL file and store its results in the directory where FILE is
found, or in the directory specified with
.B -O
.SH OPTIONS
.B --cache
path
.RS
specifies the cache directory for .alto files. They will be kept for later use,
unless
.B --clear
is also present.
.RE
.B --clear
.RS
clear the cache directory at the start of the program.
.RE
.B --type
kind
.RS
Type of document ('krant' or 'boek' Default: 'krant')
.RE
.B --direct
.RS
skip reading a DIDL file, and use ALTO files directly.
.RE
.B --oldstrings
.RS
Fall back to old version that creates <str> nodes. The default is to create
<w> nodes.
.RE
.B --compress
type
.RS
Create compressed files. type=b for .bz2 and type=g for .gz files.
.RE
.B -t
or
.B --threads
number
.RS
Number of concurrent threads to be used by the programme
.RE
.B -V
or
.B --version
.RS
Show VERSION
.RE
.B -O
outputdir
.RS
Place the output in 'outputdir'
.RE
.B --setname
set
.RS
When creating <str> or <w> nodes,
.B FoLiA-alto
will place them in the setname 'set'. The default is
.B FoLiA-alto-set
.RE
.B --class
name
.RS
When creating <t> nodes,
.B FoLiA-alto
will use this for the textclass name. The default is
.B OCR
.RE
.SH BUGS
possible
.SH AUTHORS
Ko van der Sloot
Martin Reynaert
email: [email protected]