From d9c5654c29dc4bc10aa6f0d387fd1cc00222040e Mon Sep 17 00:00:00 2001 From: Xuanyu Zhou Date: Sun, 28 Oct 2018 11:54:28 -0500 Subject: [PATCH] shorten README --- README.md | 101 +++++++++++------------------------------------------- 1 file changed, 20 insertions(+), 81 deletions(-) diff --git a/README.md b/README.md index a111478..2f8e3c8 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# ZOE (Zero-shot Open Typing) +# ZOE (Zero-shot Open Entity Typing) A state of the art system for zero-shot entity fine typing with minimum supervision ## Introduction @@ -7,17 +7,17 @@ This is a demo system for our paper "Zero-Shot Open Entity Typing as Type-Compat which at the time of publication represents the state-of-the-art of zero-shot entity typing. The original experiments that produced all the results in the paper -are done with a package written in Java. This is a re-written package -that contains the same core, without experimental code. It's solely for +are done with a package written in Java. This is a re-written package solely for the purpose of demoing the algorithm and validating key results. -The results may slightly differ from published numbers, due to the randomness in Java's -HashSet and Python set's iteration order. The difference should be within 0.5%. +The results may be slightly different with published numbers, due to the randomness in Java's +HashSet and Python set's iteration order. The difference should be negligible. -A major flaw of this system is the speed of running new sentences, due to ELMo processing. -We have cached ELMo results for the provided experiments to make running experiments possible. +This system may take a long time if ran on a large number of new sentences, due to ELMo processing. +We have cached ELMo results for the provided experiments. -To this end, we are working on an online demo, and we plan to release it before EMNLP 2018. +The package also contains an online demo, please refer to [Publication Page](http://cogcomp.org/page/publication_view/845) +for more details. ## Usage @@ -25,100 +25,39 @@ To this end, we are working on an online demo, and we plan to release it before #### Prerequisites -* Minimum 16G available disk space and 16G memory. (Lower specs will not work) +* Minimum 20G available disk space and 16G memory. (strict requirement) * Python 3.X (Mostly tested on 3.5) -* A POSIX OS (Windows not tested) +* A POSIX OS (Windows not supported) * Java JDK and Maven -* `virtualenv` if you are installing with script (check if `virtualenv` command works) +* `virtualenv` if you are installing with script * `wget` if you are installing with script (Use brew to install it on OSX) * `unzip` if you are installing with script -#### Install using a shell script +#### Install using a one-line command -To make everyone's life easier, we have provided a simple way for install, simply run `sh install.sh`. +To make life easier, we provide a simple way to install with `sh install.sh`. This script does everything mentioned in the next section, plus creating a virtualenv. Use `source venv/bin/activate` to activate. #### Install manually -Generally it's recommended to create a Python3 virtualenv and work under it. - -You need to first install AllenAI's bilm-tf package by running `python3 setup.py install` in ./bilm-tf directory - -Then install requirements by `pip3 install -r requirements.txt` in project root - -Then you need to download all the data/model files. There are two steps in this: -* in bilm-tf/, download [model.zip](http://cogcomp.org/Data/ccgPapersData/xzhou45/zoe/model.zip), and uncompress -* project root, download [data.zip](http://cogcomp.org/Data/ccgPapersData/xzhou45/zoe/data.zip), and uncompress - -Then check if all files are here by `python3 scripts.py CHECKFILES` or `python3 scripts.py CHECKFILES figer` -in order to check figer caches etc. +See wiki [manual-installation](https://github.com/CogComp/zoe/wiki/Manual-Installation) ### Run the system -Currently you can do the following: +Currently you can do the following without changes to the code: * Run experiment on FIGER test set (randomly sampled as the paper): `python3 main.py figer` * Run experiment on BBN test set: `python3 main.py bbn` * Run experiment on the first 1000 Ontonotes_fine test set instances (due to size issue): `python3 main.py ontonotes` -Additionally, you can run server mode that initializes an online demo with `python3 server.py` -However, this requires some additional file that's not provided for download yet. +Additionally, you can run server mode that initializes the online demo with `python3 server.py` +However, this requires some additional files that's not provided for download yet. Please directly contact the authors. -It's generally an expensive operation to run on new sentences, but you can still do it. -Please refer to `main.py` to see how you can test on your own data. - -## Engineering details - -### Structure - -The package is composed with - -* A slightly modified ELMo source code, see [bilm-tf](https://github.com/allenai/bilm-tf) -* A main library `zoe_utils.py` -* A executor `main.py` -* A script helper `script.py` - -### zoe_utils.py - -This is the main library file which contains the core logic. - -It has 4 main component Classes: - -#### `EsaProcessor` - -Supports all operations related to ESA and its data files. - -A main entrance is `EsaProcessor.get_candidates` which given a sentence, returns -the top `EsaProcessor.RETURN_NUM` candidate Wikipedia concepts - -#### `ElmoProcessor` - -Supports all operations related to ElMo and its data files. - -A main entrance is `ElmoProcessor.rank_candidates`, which given a sentence and a list -of candidates (generated from ESA), rank them by ELMo representation cosine similarities. (see paper) - -It will return the top `ElmoProcessor.RANKED_RETURN_NUM` candidates. - -#### `InferenceProcessor` - -This is the core engine that does inference given outputs from the previous processors. - -The logic behind it is as described in the paper and is rather complicated. - -One main entrance is `InferenceProcessor.inference` which receives a sentence, outputs from -previously mentioned processors, and set inference results. - -#### `Evaluator` - -This evaluates performances and print them, after given a list of sentences processed by -`InferenceProcessor` - -#### `DataReader` +It's generally an expensive operation to run on large numerb of new sentences, but you are welcome to do it. +Please refer to `main.py` and [Engineering Details](https://github.com/CogComp/zoe/wiki/Engineering-Details) +to see how you can test on your own data. -Initialize this with a data file path. It reads standard json formats (see examples) -and transform the data into a list of `Sentence` ## Citation See the following paper: