Command-line tool to extract taxonomies from wikidata.
wikidata-taxonomy requires at least NodeJs version 4.
Install globally to make command wdtaxonomy
accessible from your shell $PATH
$ npm install -g wikidata-taxonomy
This module provides the command wdtaxonomy
. By default, a usage help is printed:
$ wdtaxonomy
The first arguments needs to be a Wikidata identifier to be used as root. For instance extract a taxonomy of planets (Q634):
$ wdtaxonomy Q634
The extracted taxonomy is based on statements using the property "subclass of" (P279) or "subproperty of" (P1647) and additional statistics. Option --sparql
prints the SPARQL queries that are used.
Taxonomy extraction and output can be controlled by several options. For instance this command lists a biological taxonomy of mammals:
$ wdtaxonomy.js Q7377 --property P171 --brief
By default, the taxonomy is printed in "tree
" format with colored Unicode characters:
$ wdtaxonomy Q17362350
planet of the Solar System (Q17362350) •2 ↑
├──outer planet (Q30014) •23 ×4 ↑
└──inner planets (Q3504248) •8 ×4 ↑
The output contains item labels, Wikidata identifiers, the number of
Wikimedia sites connected to each item (indicated by bullet character "•
the number of instances (property P31,
indicated by a multiplication sign "×
"), and an upwards arrow ("↑
") as
indicator for additional superclass not included in the tree.
Option "--instances
" (or "-i
") explicitly includes instances:
$ wdtaxonomy -i Q17362350
planet of the Solar System (Q17362350) •2 ↑
├──outer planet (Q30014) •23 ↑
| -Saturn (Q193)
| -Jupiter (Q319)
| -Uranus (Q324)
| -Neptune (Q332)
└──inner planets (Q3504248) •8 ↑
-Earth (Q2)
-Mars (Q111)
-Mercury (Q308)
-Venus (Q313)
Classes that occur at multiple places in the taxonomy (multihierarchy) are marked like in the following example:
$ wdtaxonomy Q634
planet (Q634) •196 ×7 ↑
├──extrasolar planet (Q44559) •81 ×833 ↑
| ├──circumbinary planet (Q205901) •14 ×10
| ├──super-Earth (Q327757) •32 ×46
├──terrestrial planet (Q128207) •67 ×7
| ╞══super-Earth (Q327757) •32 ×46 …
The CSV format ("--format csv
") is optimized for comparing differences in
time. Each output row consists of five fields:
level in the hierarchy indicated by zero or more "
" (default) or "=
" characters (multihierarchy). -
id of the item. Items on the same level are sorted by their id.
label of the item. Language can be selected with option
. The character,
in labels is replaces by a whitespace. -
sites: number of connected sites (Wikipedia and related project editions). Larger numbers may indicate more established concepts.
parents outside of the hierarchy, indicated by zero or more "
" characters.
For instance the CSV output for Q634 would be like this:
$ wdtaxonomy -f csv Q634
-,Q44559,extrasolar planet,81,833,^
--,Q205901,circumbinary planet,14,10,
-,Q128207,terrestrial planet,67,7,
In this example there are 196 Wikipedia editions or other sites with an
article about planets and seven Wikidata items are direct instance of a
planet. At the end of the line "^
" indicates that "planet" has one
superclass. In the next rows "extrasolar planet"
(Q44559) is a subclass of planet with
another superclass indicated by "^
". Both "circumbinary planet" and
"super-Earth" are subclasses of "extrasolar planet". The latter also occurs as
subclass of "terrestrial planet" where it is marked by "==
" instead of
Option --format json
serializes the taxonomy as JSON object with the following fields:
- root: Wikidata identifier of the root item/property
- items: object with Wikidata items/properties, indexed by their identifier
- narrower
- broader
- instances (if option
is enabled)
The hierarchy properties P279 ("subclass
of") and P31 ("instance of") to build
taxonomies from can be changed with option property
Members of (P463) the European Union (Q458):
$ wdtaxonomy Q458 -P P463
Members of (P463) the European Union (Q458) and number of its citizens in Wikidata (P27):
$ wdtaxonomy Q458 -P 463/27
As Wikidata is no strict ontology, subproperties are not factored in. For instance this query does not include members of the European Union although P463 is a subproperty of P361.
Parts of (P361) the European Union (Q458):
$ wdtaxonomy Q458 -P P361
A taxonomy of subproperties can be queried like taxonomies of items. The hierarchy property is set to P1647 ("subproperty of") by default:
$ wdtaxonomy P361
$ wdtaxonomy P361 -P P1647 # equivalent
Subproperties of "part of" (P361) and which of them have an inverse property (P1696):
$ wdtaxonomy P361 -P P1647/P1696
Inverse properties are neither factored in so queries like these do not necesarrily return the same results:
What hand (Q33767) is part of (P361):
$ wdtaxonomy Q33767 -P 361 -r
What parts the hand (Q33767) has (P527):
$ wdtaxonomy Q33767 -P 527
Release notes are listed in file in the source code repository.
This document
Related tools
- wikidata-sdk is used by this module
- wikidata-cli provide more generic command line tools for Wikidata
- taxonomy browser is a web application based on Wikidata dumps