-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathistex_metadata_en.xml
114 lines (93 loc) · 3.07 KB
/
istex_metadata_en.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
<tool id="istex-metadata" name="Generating a metadata file" version="0.5.2">
<description>from an ISTEX id file</description>
<requirements>
<container type="docker">visatm/istex-metadata</container>
</requirements>
<command><![CDATA[
IstexMetadata.pl -i "$input" -m $metadata -l "$language"
#if $teeft == 'yes'
-t "$docterm"
#end if
]]></command>
<inputs>
<param name="input" type="data" format="txt" label="ISTEX id file" />
<param name="teeft" type="select" display="radio" label="Extract TEEFT descriptors?">
<option value="no" selected="true">No</option>
<option value="yes">Yes</option>
</param>
<param name="language" type="select" display="radio" label="Metadata file language">
<option value="en" selected="true">English</option>
<option value="fr">French</option>
</param>
</inputs>
<outputs>
<data format="tabular" label="Metadata on ${on_string}" name="metadata" />
<data format="tabular" label="TEEFT doc × term on ${on_string}" name="docterm">
<filter>teeft == 'yes'</filter>
</data>
</outputs>
<tests>
<test>
<param name="input" value="istexCorpus.txt" />
<param name="teeft" value="no" />
<output name="metadata" file="istexMetadata.txt" />
</test>
</tests>
<help><![CDATA[
From a list of ISTEX id in the input file, this tool extracts the corresponding metadata
and produces a tab-separated file. The first line of that file is a header with the different
fields name.
It is also possible to extract the TEEFT keywords associated with the documents to produce
a “doc × term” file.
-----
**Options**
The programme has several arguments, one **mandatory**, the others *optional*.
+ *“ISTEX id”* **inputfile name**
+ *TEEFT “doc × term” filename*, if you wish to extract these keywords
+ *Metadata file language*, used for the list of fields and the languages of publication
**Input datafile**
The input file must contains a line with the mention **ISTEX** between square brackets
followed by a list of id, one per line preceded by the type of id. Any information present
before these lines will be ignored. Each id can be followed by a hash sign ('#') and a file
name.
Example:
::
[ISTEX]
ark ark:/67375/0T8-VC3W50FL-B # test001
ark ark:/67375/6H6-ZNZ982C4-N # test002
ark ark:/67375/6H6-4NH81FDN-R # test003
ark ark:/67375/6H6-1FSRLFB6-Q # test004
ark ark:/67375/WNG-RHR3302C-7 # test005
ark ark:/67375/WNG-R0P0D6BZ-L # test006
-----
**Metadata**
The metadata file is composed of tab-separated columns. The first line is reserved to the header
containing the different field names:
+ Filename
+ Title
+ Author
+ Affiliation
+ Source
+ ISSN
+ e-ISSN
+ ISBN
+ e-ISBN
+ Publisher
+ Document type
+ Content type
+ Date
+ Language
+ Abstract
+ Author’s keywords
+ WoS category
+ Science-Metrix category
+ Scopus category
+ INIST category
+ Quality score
+ PDF version
+ ISTEX Id
+ ARK
+ DOI
+ PMID
]]></help>
</tool>