Skip to content

Commit

Permalink
shorten README
Browse files Browse the repository at this point in the history
  • Loading branch information
Slash0BZ committed Oct 28, 2018
1 parent 7387004 commit d9c5654
Showing 1 changed file with 20 additions and 81 deletions.
101 changes: 20 additions & 81 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# ZOE (Zero-shot Open Typing)
# ZOE (Zero-shot Open Entity Typing)
A state of the art system for zero-shot entity fine typing with minimum supervision

## Introduction
Expand All @@ -7,118 +7,57 @@ This is a demo system for our paper "Zero-Shot Open Entity Typing as Type-Compat
which at the time of publication represents the state-of-the-art of zero-shot entity typing.

The original experiments that produced all the results in the paper
are done with a package written in Java. This is a re-written package
that contains the same core, without experimental code. It's solely for
are done with a package written in Java. This is a re-written package solely for
the purpose of demoing the algorithm and validating key results.

The results may slightly differ from published numbers, due to the randomness in Java's
HashSet and Python set's iteration order. The difference should be within 0.5%.
The results may be slightly different with published numbers, due to the randomness in Java's
HashSet and Python set's iteration order. The difference should be negligible.

A major flaw of this system is the speed of running new sentences, due to ELMo processing.
We have cached ELMo results for the provided experiments to make running experiments possible.
This system may take a long time if ran on a large number of new sentences, due to ELMo processing.
We have cached ELMo results for the provided experiments.

To this end, we are working on an online demo, and we plan to release it before EMNLP 2018.
The package also contains an online demo, please refer to [Publication Page](http://cogcomp.org/page/publication_view/845)
for more details.

## Usage

### Install the system

#### Prerequisites

* Minimum 16G available disk space and 16G memory. (Lower specs will not work)
* Minimum 20G available disk space and 16G memory. (strict requirement)
* Python 3.X (Mostly tested on 3.5)
* A POSIX OS (Windows not tested)
* A POSIX OS (Windows not supported)
* Java JDK and Maven
* `virtualenv` if you are installing with script (check if `virtualenv` command works)
* `virtualenv` if you are installing with script
* `wget` if you are installing with script (Use brew to install it on OSX)
* `unzip` if you are installing with script

#### Install using a shell script
#### Install using a one-line command

To make everyone's life easier, we have provided a simple way for install, simply run `sh install.sh`.
To make life easier, we provide a simple way to install with `sh install.sh`.

This script does everything mentioned in the next section, plus creating a virtualenv. Use `source venv/bin/activate` to activate.

#### Install manually

Generally it's recommended to create a Python3 virtualenv and work under it.

You need to first install AllenAI's bilm-tf package by running `python3 setup.py install` in ./bilm-tf directory

Then install requirements by `pip3 install -r requirements.txt` in project root

Then you need to download all the data/model files. There are two steps in this:
* in bilm-tf/, download [model.zip](http://cogcomp.org/Data/ccgPapersData/xzhou45/zoe/model.zip), and uncompress
* project root, download [data.zip](http://cogcomp.org/Data/ccgPapersData/xzhou45/zoe/data.zip), and uncompress

Then check if all files are here by `python3 scripts.py CHECKFILES` or `python3 scripts.py CHECKFILES figer`
in order to check figer caches etc.
See wiki [manual-installation](https://github.com/CogComp/zoe/wiki/Manual-Installation)

### Run the system

Currently you can do the following:
Currently you can do the following without changes to the code:
* Run experiment on FIGER test set (randomly sampled as the paper): `python3 main.py figer`
* Run experiment on BBN test set: `python3 main.py bbn`
* Run experiment on the first 1000 Ontonotes_fine test set instances (due to size issue): `python3 main.py ontonotes`

Additionally, you can run server mode that initializes an online demo with `python3 server.py`
However, this requires some additional file that's not provided for download yet.
Additionally, you can run server mode that initializes the online demo with `python3 server.py`
However, this requires some additional files that's not provided for download yet.
Please directly contact the authors.

It's generally an expensive operation to run on new sentences, but you can still do it.
Please refer to `main.py` to see how you can test on your own data.

## Engineering details

### Structure

The package is composed with

* A slightly modified ELMo source code, see [bilm-tf](https://github.com/allenai/bilm-tf)
* A main library `zoe_utils.py`
* A executor `main.py`
* A script helper `script.py`

### zoe_utils.py

This is the main library file which contains the core logic.

It has 4 main component Classes:

#### `EsaProcessor`

Supports all operations related to ESA and its data files.

A main entrance is `EsaProcessor.get_candidates` which given a sentence, returns
the top `EsaProcessor.RETURN_NUM` candidate Wikipedia concepts

#### `ElmoProcessor`

Supports all operations related to ElMo and its data files.

A main entrance is `ElmoProcessor.rank_candidates`, which given a sentence and a list
of candidates (generated from ESA), rank them by ELMo representation cosine similarities. (see paper)

It will return the top `ElmoProcessor.RANKED_RETURN_NUM` candidates.

#### `InferenceProcessor`

This is the core engine that does inference given outputs from the previous processors.

The logic behind it is as described in the paper and is rather complicated.

One main entrance is `InferenceProcessor.inference` which receives a sentence, outputs from
previously mentioned processors, and set inference results.

#### `Evaluator`

This evaluates performances and print them, after given a list of sentences processed by
`InferenceProcessor`

#### `DataReader`
It's generally an expensive operation to run on large numerb of new sentences, but you are welcome to do it.
Please refer to `main.py` and [Engineering Details](https://github.com/CogComp/zoe/wiki/Engineering-Details)
to see how you can test on your own data.

Initialize this with a data file path. It reads standard json formats (see examples)
and transform the data into a list of `Sentence`

## Citation
See the following paper:
Expand Down

0 comments on commit d9c5654

Please sign in to comment.