Teprolin

Teprolin is a Python platform for text pre-processing that has been developed in the Teprolin project. It is described in the following paper (click to read it from the conference proceedings):

Ion, Radu. (2018). TEPROLIN: An Extensible, Online Text Preprocessing Platform for Romanian. In Proceedings of the International Conference on Linguistic Resources and Tools for Processing Romanian Language (ConsILR 2018), November 22-23, 2018, Iași, România.

Installation

Teprolin only works with Python 3 and it has been tested with versions 3.6, 3.7 and 3.8 on both Windows 10 and Linux Ubuntu 20.04. Teprolin includes the TTL text pre-processor which runs in Perl. In Windows, we used Strawberry Perl and in Ubuntu, the default perl installation.

TTL

To make sure TTL works, issue the following commands in a perl-enabled command prompt (perl has to be in PATH):

cpan install Unicode::String

cpan install Algorithm::Diff

cpan install BerkeleyDB

cpan install File::Which

cpan install File::HomeDir

Check that the script named TeproTTL.pl compiles OK by executing perl -c TeproTTL.pl.

NLP-Cube and UD-Pipe

NLP-Cube and UD-Pipe 1 have their own repositories at GitHub.

TTS Frontend

SSLA is a Text-To-Speech library developed by Tiberiu Boroș et al. Read about it on arXiv. The source code can be found on GitHub at SSLA. MLPLA is the text preprocessing front-end for SSLA and it is used in TEPROLIN for:

word hyphenation
word stress identification
phonetic transcription

Additionally, we ported some code from our ROBIN Dialog Manager project to do numeral rewriting, also for the benefit of TTS tools. In order to run MLPLA, you need Java Runtime Engine 15 installed and available in PATH.

If you want to build the MLPLAServer yourself, install the MLPLA text preprocessing library in your local Maven repository by running this command:

mvn install::install-file -Dfile=ttsops/MLPLAServer/lib/MLPLA.jar -DgroupId=ro.racai -DartifactId=mlpla -Dversion=1.0.0 -Dpackaging=jar -DgeneratePom=true

and, you need to run the following mvn command in order to generate the jar with all dependencies:

mvn clean compile test assembly:single antrun:run@copy-uber-jar

Teprolin resource files

The resource files are models, lexicons, mapping files, etc. that are loaded by all NLP apps of Teprolin. They sit in the .teprolin folder, under your home folder. In Windows 10 this is %USERPROFILE% and in Linux, ~. These files are now automatically installed by TEPROLIN.

Python 3 dependencies

To install all the related Python 3 packages in two commands, using a virtual environment, do this:

python3 -m venv /path/to/new/virtual/environment

then activate the new environment executing the source /path/to/new/virtual/environment/bin/activate. Finally, run

pip3 install -r requirements.txt

Testing

For a quick test session, using small texts (say up to 1000 chars), head to RELATE's test page. If you want to test different algorithms (e.g. UD-Pipe vs. NLP-Cube), you can access this link.

If you want to test the installation, issue pytest -v tests from the root of this repository. Please be patient, it will take a bit:

Running the REST web service

To quickly test the REST service, logging to console, run the following command from the root of this repository:

python3 TeproREST.py

to start the server in the foreground, with a single-process, in development mode.

Only on Linux: to start/stop the server in production mode using uwsgi for the RELATE platform, do this:

pip3 install uwsgi

start-ws.sh

stop-ws.sh

To start the server on three different ports for faster, multi-threaded processing, do this:

start-ws-mt.sh

stop-ws-mt.sh

Docker container

The easiest way to use the Teprolin text processing platform is to get the already-built Docker container:

docker pull raduion/teprolin:1.1

from Docker Hub.

If you want to build the image yourself, just issue:

docker build --pull --rm -f "Dockerfile" -t teprolin:1.1 "."

or use the Visual Studio Code Docker extension along with Docker Desktop.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
bioner		bioner
cubenlp		cubenlp
diac		diac
docker		docker
images		images
ner		ner
tests		tests
tnorm		tnorm
ttl		ttl
ttsops		ttsops
udpipe		udpipe
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
PyTEPRO.py		PyTEPRO.py
README.md		README.md
TeproAlgo.py		TeproAlgo.py
TeproApi.py		TeproApi.py
TeproConfig.py		TeproConfig.py
TeproDTO.py		TeproDTO.py
TeproREST.py		TeproREST.py
TeproTTL.pl		TeproTTL.pl
TeproTok.py		TeproTok.py
Teprolin.py		Teprolin.py
favicon.ico		favicon.ico
index.html		index.html
mlpla-port.txt		mlpla-port.txt
requirements.txt		requirements.txt
start-ws-mt.sh		start-ws-mt.sh
start-ws.sh		start-ws.sh
stop-ws-mt.sh		stop-ws-mt.sh
stop-ws.sh		stop-ws.sh
teprolin-stats.txt		teprolin-stats.txt
ttl-port.txt		ttl-port.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Teprolin

Installation

TTL

NLP-Cube and UD-Pipe

TTS Frontend

Teprolin resource files

Python 3 dependencies

Testing

Running the REST web service

Docker container

About

Releases

Packages

Languages

License

racai-ai/TEPROLIN

Folders and files

Latest commit

History

Repository files navigation

Teprolin

Installation

TTL

NLP-Cube and UD-Pipe

TTS Frontend

Teprolin resource files

Python 3 dependencies

Testing

Running the REST web service

Docker container

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages