Skip to content

Commit

Permalink
Merge pull request #18 from CogComp/dev
Browse files Browse the repository at this point in the history
Version upgrade
  • Loading branch information
Slash0BZ authored Oct 28, 2018
2 parents be81a7f + a9f4a02 commit 1d823bf
Show file tree
Hide file tree
Showing 6 changed files with 133 additions and 137 deletions.
100 changes: 22 additions & 78 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# ZOE (Zero-shot Open Typing)
# ZOE (Zero-shot Open Entity Typing)
A state of the art system for zero-shot entity fine typing with minimum supervision

## Introduction
Expand All @@ -7,113 +7,57 @@ This is a demo system for our paper "Zero-Shot Open Entity Typing as Type-Compat
which at the time of publication represents the state-of-the-art of zero-shot entity typing.

The original experiments that produced all the results in the paper
are done with a package written in Java. This is a re-written package
that contains the same core, without experimental code. It's solely for
are done with a package written in Java. This is a re-written package solely for
the purpose of demoing the algorithm and validating key results.

The results may slightly differ from published numbers, due to the randomness in Java's
HashSet and Python set's iteration order. The difference should be within 0.5%.
The results may be slightly different with published numbers, due to the randomness in Java's
HashSet and Python set's iteration order. The difference should be negligible.

A major flaw of this system is the speed of running new sentences, due to ELMo processing.
We have cached ELMo results for the provided experiments to make running experiments possible.
This system may take a long time if ran on a large number of new sentences, due to ELMo processing.
We have cached ELMo results for the provided experiments.

To this end, we are working on an online demo, and we plan to release it before EMNLP 2018.
The package also contains an online demo, please refer to [Publication Page](http://cogcomp.org/page/publication_view/845)
for more details.

## Usage

### Install the system

#### Prerequisites

* Minimum 16G available disk space and 16G memory. (Lower specs will not work)
* Minimum 20G available disk space and 16G memory. (strict requirement)
* Python 3.X (Mostly tested on 3.5)
* A POSIX OS (Windows not tested)
* `virtualenv` if you are installing with script (check if `virtualenv` command works)
* A POSIX OS (Windows not supported)
* Java JDK and Maven
* `virtualenv` if you are installing with script
* `wget` if you are installing with script (Use brew to install it on OSX)
* `unzip` if you are installing with script

#### Install using a shell script
#### Install using a one-line command

To make everyone's life easier, we have provided a simple way for install, simply run `sh install.sh`.
To make life easier, we provide a simple way to install with `sh install.sh`.

This script does everything mentioned in the next section, plus creating a virtualenv. Use `source venv/bin/activate` to activate.

#### Install manually

Generally it's recommended to create a Python3 virtualenv and work under it.

You need to first install AllenAI's bilm-tf package by running `python3 setup.py install` in ./bilm-tf directory

Then install requirements by `pip3 install -r requirements.txt` in project root

Then you need to download all the data/model files. There are two steps in this:
* in bilm-tf/, download [model.zip](http://cogcomp.org/Data/ccgPapersData/xzhou45/zoe/model.zip), and uncompress
* project root, download [data.zip](http://cogcomp.org/Data/ccgPapersData/xzhou45/zoe/data.zip), and uncompress

Then check if all files are here by `python3 scripts.py CHECKFILES` or `python3 scripts.py CHECKFILES figer`
in order to check figer caches etc.
See wiki [manual-installation](https://github.com/CogComp/zoe/wiki/Manual-Installation)

### Run the system

Currently you can do the following:
Currently you can do the following without changes to the code:
* Run experiment on FIGER test set (randomly sampled as the paper): `python3 main.py figer`
* Run experiment on BBN test set: `python3 main.py bbn`
* Run experiment on the first 1000 Ontonotes_fine test set instances (due to size issue): `python3 main.py ontonotes`

It's generally an expensive operation to run on new sentences, but you can still do it.
Please refer to `main.py` to see how you can test on your own data.

## Engineering details

### Structure

The package is composed with

* A slightly modified ELMo source code, see [bilm-tf](https://github.com/allenai/bilm-tf)
* A main library `zoe_utils.py`
* A executor `main.py`
* A script helper `script.py`

### zoe_utils.py

This is the main library file which contains the core logic.

It has 4 main component Classes:

#### `EsaProcessor`

Supports all operations related to ESA and its data files.

A main entrance is `EsaProcessor.get_candidates` which given a sentence, returns
the top `EsaProcessor.RETURN_NUM` candidate Wikipedia concepts

#### `ElmoProcessor`

Supports all operations related to ElMo and its data files.

A main entrance is `ElmoProcessor.rank_candidates`, which given a sentence and a list
of candidates (generated from ESA), rank them by ELMo representation cosine similarities. (see paper)

It will return the top `ElmoProcessor.RANKED_RETURN_NUM` candidates.

#### `InferenceProcessor`

This is the core engine that does inference given outputs from the previous processors.

The logic behind it is as described in the paper and is rather complicated.

One main entrance is `InferenceProcessor.inference` which receives a sentence, outputs from
previously mentioned processors, and set inference results.

#### `Evaluator`

This evaluates performances and print them, after given a list of sentences processed by
`InferenceProcessor`
Additionally, you can run server mode that initializes the online demo with `python3 server.py`
However, this requires some additional files that's not provided for download yet.
Please directly contact the authors.

#### `DataReader`
It's generally an expensive operation to run on large numerb of new sentences, but you are welcome to do it.
Please refer to `main.py` and [Engineering Details](https://github.com/CogComp/zoe/wiki/Engineering-Details)
to see how you can test on your own data.

Initialize this with a data file path. It reads standard json formats (see examples)
and transform the data into a list of `Sentence`

## Citation
See the following paper:
Expand Down
38 changes: 32 additions & 6 deletions frontend/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,23 @@
alert("You must enter a sentence to proceed.");
return;
}
var tokens = sentence.trim().split(" ");
let xhr = new XMLHttpRequest();
xhr.open("POST", SERVER_API + "annotate_token", true);
xhr.setRequestHeader("Content-Type", "application/json");
xhr.onreadystatechange = function () {
if (xhr.readyState === XMLHttpRequest.DONE && xhr.status === 200) {
var json = JSON.parse(xhr.responseText);
continueGenerateTokens(json);
}
};
var data = JSON.stringify({
sentence: sentence,
});
xhr.send(data);
}

function continueGenerateTokens(result) {
var tokens = result["tokens"];
document.getElementById("total-token-num").innerText = String(tokens.length);
for (var i = 0; i < tokens.length; i++) {
var curToken = tokens[i];
Expand Down Expand Up @@ -262,6 +278,16 @@
document.getElementById("using-preset-example").innerText = String(-1);
}

function getTokens() {
var parent_div = document.getElementById("token-display");
var i;
var tokens = [];
for (i = 0; i < parent_div.children.length; i++) {
tokens.push(parent_div.children[i].innerHTML);
}
return tokens;
}

function generatePresetMentions() {
var sentence = document.getElementById("sentence-input").value;
var xhr = new XMLHttpRequest();
Expand All @@ -274,7 +300,7 @@
}
};
var data = JSON.stringify({
tokens: sentence.trim().split(" "),
tokens: getTokens(),
});
xhr.send(data);
}
Expand Down Expand Up @@ -504,7 +530,7 @@
};
var data_vec = JSON.stringify({
index: i,
tokens: sentence.trim().split(" "),
tokens: getTokens(),
mention_starts: [mention_starts[i]],
mention_ends: [mention_ends[i]],
});
Expand All @@ -521,7 +547,7 @@
};
var data_simple = JSON.stringify({
index: i,
tokens: sentence.trim().split(" "),
tokens: getTokens(),
mention_starts: [mention_starts[i]],
mention_ends: [mention_ends[i]],
});
Expand All @@ -538,7 +564,7 @@
};
var data = JSON.stringify({
index: i,
tokens: sentence.trim().split(" "),
tokens: getTokens(),
mention_starts: [mention_starts[i]],
mention_ends: [mention_ends[i]],
mode: getInferenceMode(),
Expand Down Expand Up @@ -634,7 +660,7 @@

function getExampleSentenceMention(id) {
if (id == 1) {
return [[0, 2], [10, 12], [15, 17]];
return [[0, 2], [11, 13], [16, 18]];
}
if (id == 2) {
return [[0, 1], [5, 7], [9, 11], [20, 21]];
Expand Down
27 changes: 27 additions & 0 deletions install.sh
Original file line number Diff line number Diff line change
@@ -1,12 +1,39 @@
#!/bin/bash

if ! [ -x "$(command -v java)" ]; then
echo 'Error: Java in not installed.'
exit 1
fi
if ! [ -x "$(command -v mvn)" ]; then
echo 'Error: maven is not installed.'
exit 1
fi
if ! [ -x "$(command -v python3)" ]; then
echo 'Error: python 3.x is not installed.'
exit 1
fi
if ! [ -x "$(command -v virtualenv)" ]; then
echo 'Error: virtualenv is not installed.'
exit 1
fi
if ! [ -x "$(command -v wget)" ]; then
echo 'Error: wget is not found. Either install or find replacement and modify this script.'
exit 1
fi
if ! [ -x "$(command -v unzip)" ]; then
echo 'Error: unzip is not found. Either install or find replacement and modify this script.'
exit 1
fi
echo 'All dependencies satisfied. Moving on...'

virtualenv -p python3 venv
cd ./bilm-tf
../venv/bin/python3 setup.py install
wget http://cogcomp.org/Data/ccgPapersData/xzhou45/zoe/model.zip
unzip model.zip
rm model.zip
cd ../
venv/bin/pip3 install Cython
venv/bin/pip3 install -r requirements.txt
wget http://cogcomp.org/Data/ccgPapersData/xzhou45/zoe/data.zip
unzip -n data.zip
Expand Down
1 change: 0 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,5 @@ scipy
regex
Flask
flask-cors
cython
ccg_nlpy
gensim
Loading

0 comments on commit 1d823bf

Please sign in to comment.