-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathFoLiA-idf.1
80 lines (64 loc) · 1.18 KB
/
FoLiA-idf.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
.TH FoLiA-idf 1 "2020 jan 19"
.SH NAME
FoLiA-idf - Folia IDF word counter
.SH SYNOPSIS
idf [options] DIR
.SH DESCRIPTION
.B FoLiA-idf
counts the intradocument frequency of the words in FoLiA documents.
It will process all folia XML files in DIR and store its results in the current directory in a file called DIR.idf.tsv
.SH OPTIONS
.B --clip
number
.RS
clipping factor. When an idf is lower than 'number', it will not be stored.
.RE
.B --lower
.RS
Lowercase all words.
.RE
.B --strings
.RS
search for <str> nodes in the FoLiA. The default is to search for <w> nodes.
.RE
.B -t
or
.B --threads
number
.RS
Number of concurrent threads the program may use
.RE
.B -V
or
.B --version
.RS
Show VERSION
.RE
.B -h
or
.B --help
.RS
get help
.RE
.B -e
expr
.RS
when searching for files,
.B
FoLiA-idf
will only consider files that match with the expression 'expr' which may contain wildcards. The 'expr' is only matched against the file part. Not against paths.
.RE
.B -R
.RS
when a DIR is provided,
.B FoLiA-idf
will recurse through this DIR and its subdirs to find files.
.RE
.SH BUGS
possibly
.SH AUTHORS
Ko van der Sloot
Martin Reynaert
email: [email protected]
.SH SEE ALSO
.B FoLiA-stats