Talismane is a natural language processing framework with sentence detector, tokeniser, pos-tagger and dependency syntax parser. Current available language packs include French (standard and Universal Dependencies) and English.
Sample input:
Les amoureux qui se bécotent sur les bancs publics ont des petites gueules bien sympathiques.
Sample output: a syntax tree, shown below in CoNLL-X format, also available as a Java object for manipulation in code.
1 Les les DET DET n=p| 2 det 2 det 2 amoureux amoureux NC NC g=m| 10 suj 10 suj 3 qui qui PROREL PROREL n=s| 5 suj 5 suj 4 se se CLR CLR n=p|p=3| 5 aff 5 aff 5 bécotent bécoter V V n=p|t=PS|p=3| 2 mod_rel 2 mod_rel 6 sur sur P P 5 mod 5 mod 7 les les DET DET n=p| 8 det 8 det 8 bancs banc NC NC n=p|g=m| 6 prep 6 prep 9 publics public ADJ ADJ n=p|g=m| 8 mod 8 mod 10 ont avoir V V n=p|t=P|p=3| 0 root 0 root 11 des des DET DET n=p| 13 det 13 det 12 petites petit ADJ ADJ n=p|g=f| 13 mod 13 mod 13 gueules gueule NC NC n=p| 10 obj 10 obj 14 bien bien ADV ADV 15 mod 15 mod 15 sympathiques sympathique ADJ ADJ n=p| 13 mod 13 mod 16 . . PONCT PONCT 15 ponct 15 ponct
Downloads: The latest release and language packs can be downloaded on the releases pages.
Wiki: Simple instructions for use can be found on the Talismane wiki.
Command-line usage: follow the setup instructions, and then run a command similar to the following:
java -Xmx1G -Dconfig.file=talismane-fr-X.X.X.conf -jar talismane-core-X.X.X.jar --analyse --sessionId=fr --encoding=UTF8 --inFile=data/frTest.txt --outFile=data/frTest.tal
Calling from Java: For syntax analysis within Java code via the API, see this Java code example.
JavaDoc API: You may also consult the full JavaDoc API online.
User's manual: An out-of-date users's manual can be found on the GitHub Talismane project page. For up-to-date documentation, you're far better off consulting the wiki or the JavaDoc API .
Additional information on the project can be found on the CLLE-ERSS laboratory Talismane project home page.
Language pack usage
-
The French language pack can be used for research purposes provided that you have a license for the French Treebank. The model included is not optimised as it uses a Maximum Entropy model (which only requires about 1G of RAM) rather than a Linear SVM model (which requires about 24G RAM). If you would like the more optimised Linear SVM model, please contact Assaf Urieli.
-
The English language pack can be used for research purposes provided that you have a license for the Penn Treebank. WARNING: the English model is only an initial version, with no attempts at optimisation.