SILKNOW Text Classification

Downloading the Trained Models

Download the models from Zenodo: Models

Requirements

See requirements.txt they include the following:

* A CUDA capable GPU
* Python and the following libraries shown in requirements.txt
* Aligned Embeddings (see below)

Command Line Tool

Usage

$./sntxtclassify.py -h
usage: sntxtclassify <command> [<args>]

The available commands are:
   train        Train a text classification model
   evaluate     Evaluate a text classification model
   classify     Classify text samples

SilkNOW Text Classifier

positional arguments:
  {train,evaluate,classify}

optional arguments:
  -h, --help            show this help message and exit

Training a Model

./sntxtclassify.py train -h
       [-h] --data-train DATA_TRAIN --target TARGET
       [--pretrained-embeddings PRETRAINED_EMBEDDINGS] [--all-embeddings]
       --model-save MODEL_SAVE

optional arguments:
  -h, --help            show this help message and exit
  --data-train DATA_TRAIN
                        train CSV file
  --target TARGET       CSV column target
  --pretrained-embeddings PRETRAINED_EMBEDDINGS
                        pretrained embeddings path
  --all-embeddings      Keep all word vectors
  --model-save MODEL_SAVE
                        save model path

The train CSV file, DATA_TRAIN has the following requirements:

* Tab-separated values;
* A header;
* `txt` field containing the text;
* `lang` field containing the language of the text;
* a label field with the same name as specified by the `TARGET` argument;

The pretained embeddings directory is expected to contain the fasttext multilingual aligned embeddings available here.

Example:

./sntxtclassify.py train \
    --data-train /data/euprojects/silknow/tasks/technique/final.trn.csv \
    --target technique \
    --pretrained-embeddings /data/vectors/ftaligned \
    --model-save /data/euprojects/silknow/models/technique

Evaluating

./sntxtclassify.py evaluate -h
usage: sntxtclassify <command> [<args>]
       [-h] --model-load MODEL_LOAD --data-test DATA_TEST --target TARGET

optional arguments:
  -h, --help            show this help message and exit
  --model-load MODEL_LOAD
                        model path
  --data-test DATA_TEST
                        test CSV path
  --target TARGET       CSV column target

The test CSV file, DATA_TEST has the same requirements as DATA_TRAIN.

Example:

./sntxtclassify.py evaluate `
    --data-test /data/euprojects/silknow/tasks/material/final.tst.csv \
    --target material \
    --model-load /data/euprojects/silknow/models/material

Classifying Text

./sntxtclassify.py classify -h
usage: sntxtclassify <command> [<args>]
       [-h] [--model-load MODEL_LOAD] --data-input DATA_INPUT --data-output
       DATA_OUTPUT

optional arguments:
  -h, --help            show this help message and exit
  --model-load MODEL_LOAD
                        model path
  --data-input DATA_INPUT
                        input CSV path
  --data-output DATA_OUTPUT
                        output CSV path

The DATA_INPUT CSV only requires a txt and a lang field.

Example:

./sntxtclassify.py classify \
    --data-output /tmp/ex.tsv \
    --data-input /data/euprojects/silknow/tasks/missing_material.csv  \
    --model-load /data/euprojects/silknow/models/material

Library

Example:

import sdnn

text = ‘’’Tapestry, woven in wool and silk, after a painting by Raphael depicting the Holy Family. Tapestry, after a painting of the Holy Family by Raphael; principal weaver Adolphe Margarita; made at the Gobelins in Paris, 23/05/1852-06/12/1856’’’
lang = ‘en’

model_path = ‘/data/silknow/models/technique’
model = sdnn.load_model(model_path)
model.classify(text, lang)

Alternatively, for bulk classification of multiple texts, use:

model.classify_text(text_lang_pairs)

Where text_lang_pairs is a list of (text, lang) tuples.

REST API Server

To start a local API server use the server.py executable. Example:

./server.py  --model-load /data/euprojects/silknow/models/material --port 8000

The server exposes a single endpoint (the root). Example request:

import requests


text = ‘’’Tapestry, woven in wool and silk, after a painting by Raphael depicting the Holy Family. Tapestry, after a painting of the Holy Family by Raphael; principal weaver Adolphe Margarita; made at the Gobelins in Paris, 23/05/1852-06/12/1856’’’
lang = ‘en’

js = [{'txt': text, 'lang': lang}]

r = requests.post('http://localhost:8000', json=js)
print(r.json())
[{'label': 'silk_wool', 'score': 1.0}]

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
experiments		experiments
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
csvtxtdataset.py		csvtxtdataset.py
demo.py		demo.py
dnnhelper.py		dnnhelper.py
evalhelper.py		evalhelper.py
models.py		models.py
requirements.txt		requirements.txt
sdnn.py		sdnn.py
server.py		server.py
sntxtclassify.py		sntxtclassify.py
textpreprocessor.py		textpreprocessor.py
torchutil.py		torchutil.py
train.py		train.py
vechelper.py		vechelper.py
vocabulary.py		vocabulary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SILKNOW Text Classification

Downloading the Trained Models

Requirements

Command Line Tool

Usage

Training a Model

Evaluating

Classifying Text

Library

REST API Server

About

Releases

Packages

Contributors 2

Languages

License

silknow/text-classification

Folders and files

Latest commit

History

Repository files navigation

SILKNOW Text Classification

Downloading the Trained Models

Requirements

Command Line Tool

Usage

Training a Model

Evaluating

Classifying Text

Library

REST API Server

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages