Skip to content

WSE-research/XML2RDF-converter

Repository files navigation

XML2RDF converter using XML DTD files

If you want to translate XML data to RDF data, you come to the right place. This web service provides a fast and automated way of converting XML data to RDF data. Alternative approaches require manual mappings between the XML and RDF data, which is time-consuming and error-prone. In contrast, this tool uses the corresponding Document Type Definition (DTD) to automatically generate the RDF data (i.e., the structure of the RDF properties is derived from the DTD file). However, please note that due to the nature of the implemented approach, our automated process has no capabilities for changing the structure of the generated RDF data. Instead, the generated RDF data is a direct representation of the XML data. Concluding, this service can be used for straight-forward conversion of XML to RDF if DTDs are available.

Please recognize the corresponding Web UI for using this web service interactively.



Demo

The web service is available at https://demos.swe.htwk-leipzig.de:40180/.

Run the service

To start it execute the following command:

docker-compose up

The service uses HTTPS, so ensure that you provide a volume with the 2 files server.crt and server.key.

Transforming XML data to the exact RDF representation using Document Type Definitions (DTD)

The conversion can be done with a POST request at http://localhost:5000/xml2rdf. The endpoint requires the following JSON payload:

{
  "xml": "content of the XML file",
  "dtd": "content of the DTD file",
  "prefix": "prefix for the generated URIs (optional), default: https://example.org#",
  "lang": "language tag for string literals, default: en"
}

Example

Example Web service request using curl

curl --location 'http://localhost:5000/xml2rdf' \
--header 'Content-Type: application/json' \
--data '{
    "xml": "<dokumente builddate=\"20220210212507\" doknr=\"BJNR164400010\">  <norm builddate=\"20220210212507\" doknr=\"BJNR164400010\">    <metadaten>      <jurabk>GefStoffV 2010<\/jurabk>      <amtabk>GefStoffV<\/amtabk>      <ausfertigung-datum manuell=\"true\">2010-11-26<\/ausfertigung-datum>      <fundstelle typ=\"amtlich\">        <periodikum>BGBl I<\/periodikum>        <zitstelle>2010, 1643, 1644<\/zitstelle>      <\/fundstelle>      <kurzue>Gefahrstoffverordnung<\/kurzue>      <langue>Verordnung zum Schutz vor Gefahrstoffen<\/langue>      <standangabe checked=\"true\">        <standtyp>Stand<\/standtyp>        <standkommentar>Zuletzt ge&#228;ndert durch Art. 2 V v. 21.7.2021 I 3115<\/standkommentar>      <\/standangabe>    <\/metadaten>    <\/norm>    <\/dokumente>",
    "dtd": "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!--\tDocument Type Definition\tDiese DTD definiert den Aufbau des XML-Formats zur Veroeffentlichung der aktuellen Bundesgesetze \tund Rechtsverordnungen ueber www.gesetze-im-internet.de\tErstellt von:\tjuris GmbH\tIm Auftrag des Bundesministeriums der Justiz\t\tVersion:\t\t1.01\tErzeugt am:\t25.06.2012 \tDatei:\t\t\tGiI-Norm.dtd--><!ELEMENT dokumente (norm*)><!ATTLIST dokumente\tbuilddate CDATA #IMPLIED\tdoknr CDATA #IMPLIED><!ELEMENT norm (metadaten, textdaten?)><!ATTLIST norm\tbuilddate CDATA #IMPLIED\tdoknr CDATA #IMPLIED><!ELEMENT metadaten (jurabk+, amtabk?, ausfertigung-datum?, fundstelle*, kurzue?, langue?, gliederungseinheit?, enbez?, titel?, standangabe*)><!ELEMENT textdaten (text?, fussnoten?)><!ENTITY % bgbltitlestruct \"#PCDATA | BR | B | I | U | F | SP | small | SUP | SUB | FnR | NB | noindex\"><!ENTITY % bgbltextstruct \"%bgbltitlestruct; | Citation | FnArea | table | DL |  Split | IMG | FILE | Revision | pre | kommentar | QuoteL | QuoteR | ABWFORMAT\"><!ENTITY % bgbltblstruct  \"%bgbltitlestruct; | Citation | FnArea | table | DL |  Split | IMG | FILE | Ident | Title | P | FNA | Accolade | QuoteL | QuoteR | kommentar | ABWFORMAT\"><!ENTITY % Text \"CDATA\"><!ENTITY % LanguageCode \"NMTOKEN\"><!ENTITY % i18n\t\"xml:lang\t%LanguageCode;\t#IMPLIED\"><!ENTITY % coreattrs\t\"ID\t\tID\t\t\t#IMPLIED\tClass\t\tCDATA\t\t#IMPLIED\"\t><!ENTITY % attrs \"%coreattrs; %i18n;\"><!ENTITY % yesorno \"CDATA\"><!NOTATION Satz-3B2 SYSTEM \"3B2\"><!ENTITY % commonatts \"Id\t\tCDATA\t\t#IMPLIED\t\tLang\t\tCDATA\t\t#IMPLIED\t\tRemap\t\tCDATA\t\t#IMPLIED\t\tRole\t\tCDATA\t\t#IMPLIED\t\tXRefLabel\tCDATA\t\t#IMPLIED\"><!ELEMENT BR EMPTY><!ELEMENT B (%bgbltextstruct;)*><!ELEMENT I (%bgbltextstruct;)*><!ELEMENT U (%bgbltextstruct;)*><!ELEMENT F (#PCDATA)><!ATTLIST F\tType CDATA #IMPLIED\tSize CDATA #IMPLIED\tValue CDATA #IMPLIED><!ELEMENT SP (%bgbltextstruct;)*><!ELEMENT small (%bgbltextstruct;)*><!ELEMENT SUP (#PCDATA)><!ATTLIST SUP\tclass ( Rec ) #IMPLIED><!ELEMENT SUB (#PCDATA)><!ELEMENT FNA (#PCDATA)><!ELEMENT FnR EMPTY><!ATTLIST FnR\tID IDREF #REQUIRED><!ELEMENT NB (#PCDATA)><!ELEMENT noindex ANY><!ELEMENT Citation (%bgbltextstruct;)*><!ELEMENT FnArea (FnR)+><!ATTLIST FnArea\tLine (0 | 1) \"1\"\tSize (normal | large | small) \"normal\"><!ELEMENT table (Title?, tgroup+)><!ATTLIST table\t%commonatts; \tcolsep %yesorno; #IMPLIED\tframe (top | bottom | topbot | all | sides | none) #IMPLIED\tlabel CDATA #IMPLIED\torient (port | land) #IMPLIED\tpgwide %yesorno; #IMPLIED\trowsep %yesorno; #IMPLIED\tshortentry %yesorno; #IMPLIED\ttabstyle NMTOKEN #IMPLIED\ttocentry %yesorno; \"%yes;\"\tMarginT CDATA #IMPLIED\tMarginB CDATA #IMPLIED\tMarginL CDATA #IMPLIED\tMarginR CDATA #IMPLIED\tvj CDATA #IMPLIED\tBreak (Column | Page) #IMPLIED><!ELEMENT tgroup (colspec*, spanspec*, thead?, tbody, tfoot?)><!ATTLIST tgroup\t%commonatts; \talign (left | right | center | justify | char)  \"left\"\tindent CDATA #IMPLIED\ttindent CDATA #IMPLIED\tbindent CDATA #IMPLIED\tchar CDATA  \"\"\tcharoff CDATA  \"50\"\tcols CDATA #REQUIRED\tcolsep %yesorno; #IMPLIED\trowsep %yesorno; #IMPLIED\ttgroupstyle NMTOKEN #IMPLIED><!ELEMENT colspec EMPTY><!ATTLIST colspec\t%commonatts; \talign (left | right | center | justify | char) #IMPLIED\tindent CDATA #IMPLIED\ttindent CDATA #IMPLIED\tbindent CDATA #IMPLIED\tchar CDATA #IMPLIED\tcharoff CDATA #IMPLIED\tcolname NMTOKEN #IMPLIED\tcolnum CDATA #IMPLIED\tcolsep %yesorno; #IMPLIED\tcolwidth CDATA #IMPLIED\trowsep %yesorno; #IMPLIED ><!ELEMENT spanspec EMPTY><!ATTLIST spanspec\t%commonatts; \talign (left | right | center | justify | char)  \"center\"\tindent CDATA #IMPLIED\ttindent CDATA #IMPLIED\tbindent CDATA #IMPLIED\tchar CDATA #IMPLIED\tcharoff CDATA #IMPLIED\tcolsep %yesorno; #IMPLIED\tnameend NMTOKEN #IMPLIED\tnamest NMTOKEN #IMPLIED\trowsep %yesorno; #IMPLIED\tspanname NMTOKEN #IMPLIED ><!ELEMENT thead (colspec*, row+)><!ATTLIST thead\t%commonatts; \tvalign (top | middle | bottom)  \"bottom\"\tClass CDATA #IMPLIED\tStyle CDATA #IMPLIED ><!ELEMENT tfoot (colspec*, row+)><!ATTLIST tfoot\t%commonatts; \tvalign (top | middle | bottom)  \"top\"><!ELEMENT tbody (row+)><!ATTLIST tbody\t%commonatts; \tvalign (top | middle | bottom)  \"top\"\tClass CDATA #IMPLIED\tStyle CDATA #IMPLIED ><!ELEMENT row (entry+)><!ATTLIST row\t%commonatts; \trowsep %yesorno; #IMPLIED\tvalign (top | middle | bottom) #IMPLIED\tBreak (Column | Page) #IMPLIED ><!ELEMENT entry (%bgbltblstruct;)*><!ATTLIST entry\t%commonatts; \talign (left | right | center | justify | char) #IMPLIED\tchar CDATA #IMPLIED\tcharoff CDATA #IMPLIED\tcolname NMTOKEN #IMPLIED\tcolsep %yesorno; #IMPLIED\tmorerows CDATA #IMPLIED\tnameend NMTOKEN #IMPLIED\tnamest NMTOKEN #IMPLIED\trotate %yesorno; #IMPLIED\trowsep %yesorno; #IMPLIED\tspanname NMTOKEN #IMPLIED\tvalign (top | middle | bottom)  #IMPLIED\tdiagonal (up | down | updown) #IMPLIED\tVJ %yesorno;  \"1\"><!ELEMENT DL (DT, DD)+><!ATTLIST DL\t%attrs;\tIndent CDATA #IMPLIED\tFont (normal | bold | italic | bold-italic | underlined) \"normal\"\tType (arabic | alpha | Alpha | a-alpha | a3-alpha | roman | Roman | Dash | Bullet | Symbol | None) \"arabic\"><!ELEMENT DT (%bgbltextstruct;)*><!ATTLIST DT\t%attrs;><!ELEMENT DD (LA|Revision)+><!ATTLIST DD\t%attrs;\tFont (normal | bold | italic | bold-italic | underlined) \"normal\"><!ELEMENT LA (%bgbltextstruct;)*><!ATTLIST LA\t%attrs;\tSize (normal | small | tiny) \"normal\"\tValue CDATA #IMPLIED><!ELEMENT Split EMPTY><!ATTLIST Split\tLeader %yesorno; \"0\"><!ELEMENT IMG EMPTY><!ATTLIST IMG\t%attrs;\tSRC CDATA #REQUIRED\torient (port | land) #IMPLIED\tPos (block | inline) \"block\"\tAlign (left | center | right) \"center\"\tSize CDATA #IMPLIED\tWidth CDATA #IMPLIED\tHeight CDATA #IMPLIED\tUnits CDATA #IMPLIED\tType CDATA #IMPLIED\talt\t%Text;\t#IMPLIED\ttitle\t%Text;\t#IMPLIED><!ELEMENT FILE EMPTY><!ATTLIST FILE\t%attrs;\tSRC CDATA #REQUIRED\tPREVIEW CDATA #IMPLIED\tType CDATA #IMPLIED\ttitle %Text; #IMPLIED><!ELEMENT Revision ((Ident? | Title? | Subtitle? | (TOC | Content)?)+ | (P | DL | table)+)><!ATTLIST Revision\t%attrs;\tPostfix CDATA #IMPLIED><!ELEMENT Ident (%bgbltitlestruct;)*><!ATTLIST Ident\t%attrs;><!ELEMENT Title (%bgbltitlestruct;)*><!ATTLIST Title\t%attrs;\tAlign (left | center | right | justify | auto) \"auto\"><!ELEMENT Subtitle (%bgbltitlestruct;)*><!ATTLIST Subtitle\t%attrs;\tAlign (left | center | right | justify | auto) \"auto\"><!ELEMENT TOC ((Ident | Title | P | table)*)><!ATTLIST TOC\t%attrs;><!ELEMENT Content (P | BR | table | AttArea | FnArea | TOC | Revision | Title | Subtitle | kommentar )*><!ATTLIST Content\t%attrs;><!ELEMENT Accolade EMPTY><!ATTLIST Accolade\tAlign (left | right) \"right\"\tSize CDATA #IMPLIED\tStep CDATA #IMPLIED><!ELEMENT AttR EMPTY><!ATTLIST AttR\tID IDREF #REQUIRED><!ELEMENT AttArea (AttR)><!ELEMENT P (%bgbltextstruct;)*><!ATTLIST P\t%attrs;><!ELEMENT pre (#PCDATA | BR | B | I | small | SP | SUP | SUB | ABWFORMAT | kommentar )*><!ATTLIST pre\txml:space (default|preserve) #FIXED \"preserve\"\tcalsid CDATA #IMPLIED\tignore (nein|ja) #IMPLIED><!ELEMENT kommentar (#PCDATA | BR)*><!ATTLIST kommentar\ttyp (Stand | Stand-Hinweis | Hinweis | Fundstelle | Verarbeitung) #REQUIRED><!ELEMENT QuoteL EMPTY><!ELEMENT QuoteR EMPTY><!ELEMENT ABWFORMAT EMPTY><!ATTLIST ABWFORMAT\ttyp (A|E|D) #REQUIRED><!ELEMENT Footnotes (Footnote)+><!ELEMENT Footnote (%bgbltextstruct;)*><!ATTLIST Footnote\tID\t\tID \t\t#REQUIRED\tPrefix \tCDATA \t#IMPLIED\tFnZ \t\tCDATA \t#IMPLIED\tPostfix \tCDATA \t#IMPLIED\tPos (exp | normal) \"exp\"\tGroup (manuell | column | page | table) \"column\"><!ELEMENT langue (%bgbltextstruct;)*><!ELEMENT kurzue (%bgbltextstruct;)*><!ELEMENT amtabk (#PCDATA)><!ELEMENT gliederungseinheit ((gliederungskennzahl), (gliederungsbez?), (gliederungstitel?))><!ELEMENT gliederungskennzahl (#PCDATA)><!ELEMENT gliederungsbez (#PCDATA)><!ELEMENT gliederungstitel (%bgbltextstruct;)*><!ELEMENT enbez (#PCDATA)><!ELEMENT titel (%bgbltextstruct;)*><!ATTLIST titel format NMTOKEN #IMPLIED><!ELEMENT jurabk (#PCDATA)><!ELEMENT ausfertigung-datum (#PCDATA)><!ATTLIST ausfertigung-datum\tmanuell (nein | ja)  #REQUIRED><!ELEMENT fundstelle (periodikum, zitstelle, anlageabgabe?)><!ATTLIST fundstelle\ttyp (amtlich | nichtamtlich) #IMPLIED><!ELEMENT periodikum (#PCDATA)><!ELEMENT zitstelle (#PCDATA)><!ELEMENT anlageabgabe (anlagedat?, dokst?, abgabedat?)><!ELEMENT anlagedat (#PCDATA)><!ELEMENT dokst (#PCDATA)><!ELEMENT abgabedat (#PCDATA)><!ELEMENT standangabe (standtyp, standkommentar)><!ATTLIST standangabe    checked (ja|nein) #IMPLIED><!ELEMENT standtyp (#PCDATA)><!ELEMENT standkommentar (%bgbltextstruct;)*><!ELEMENT text ( (TOC | Content)?, Footnotes? )><!ATTLIST text format NMTOKEN #IMPLIED><!ELEMENT fussnoten ( (TOC | Content)?, Footnotes? )><!ATTLIST fussnoten format NMTOKEN #IMPLIED>",
    "prefix": "https://dtd.org",
    "lang": "de"
}'

Example data

In the folder examples you can find a sample DTD file and two sample XML files.

Generate XSLT stylesheet from Document Type Definitions (DTD)

To transform a DTD file to the XSLT stylesheet used for the RDF generation use the endpoint http://localhost:5000/dtd2xslt. The data payload needs to be the DTD file. You can use the following URL arguments to adjust the output:

  • prefix: The base URL for the generated URIs, defaults to https://example.org#

  • lang: The language tag for generated string literals, defaults to en

Example

curl --location 'http://localhost:5000/dtd2xslt?prefix=http%3A%2F%2Fexample.org&lang=de' \
--header 'Content-Type: application/xml-dtd' \
--data '@(LOCAL FILE PATH TO YOUR DTD FILE)'

Contribute

We are happy to receive your contributions. Please create a pull request or an issue. As this tool is published under the MIT license, feel free to fork it and use it in your own projects.

Disclaimer

This tool is not storing any data. All data is processed in memory and is not persisted. This tool is provided "as is" and without any warranty, express or implied.