Merge pull request #18 from CogComp/dev

Version upgrade
CogComp · Oct 28, 2018 · 1d823bf · 1d823bf
2 parents be81a7f + a9f4a02
commit 1d823bf
Show file tree

Hide file tree

Showing 6 changed files with 133 additions and 137 deletions.
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# ZOE (Zero-shot Open Typing)
+# ZOE (Zero-shot Open Entity Typing)
 A state of the art system for zero-shot entity fine typing with minimum supervision
 
 ## Introduction
@@ -7,113 +7,57 @@ This is a demo system for our paper "Zero-Shot Open Entity Typing as Type-Compat
 which at the time of publication represents the state-of-the-art of zero-shot entity typing.
 
 The original experiments that produced all the results in the paper
-are done with a package written in Java. This is a re-written package 
-that contains the same core, without experimental code. It's solely for
+are done with a package written in Java. This is a re-written package solely for
 the purpose of demoing the algorithm and validating key results. 
 
-The results may slightly differ from published numbers, due to the randomness in Java's 
-HashSet and Python set's iteration order. The difference should be within 0.5%.
+The results may be slightly different with published numbers, due to the randomness in Java's 
+HashSet and Python set's iteration order. The difference should be negligible.
 
-A major flaw of this system is the speed of running new sentences, due to ELMo processing.
-We have cached ELMo results for the provided experiments to make running experiments possible.
+This system may take a long time if ran on a large number of new sentences, due to ELMo processing.
+We have cached ELMo results for the provided experiments.
 
-To this end, we are working on an online demo, and we plan to release it before EMNLP 2018.
+The package also contains an online demo, please refer to [Publication Page](http://cogcomp.org/page/publication_view/845)
+for more details.
 
 ## Usage
 
 ### Install the system
 
 #### Prerequisites
 
-* Minimum 16G available disk space and 16G memory. (Lower specs will not work)
+* Minimum 20G available disk space and 16G memory. (strict requirement)
 * Python 3.X (Mostly tested on 3.5)
-* A POSIX OS (Windows not tested)
-* `virtualenv` if you are installing with script (check if `virtualenv` command works)
+* A POSIX OS (Windows not supported)
+* Java JDK and Maven
+* `virtualenv` if you are installing with script
 * `wget` if you are installing with script (Use brew to install it on OSX)
 * `unzip` if you are installing with script
 
-#### Install using a shell script
+#### Install using a one-line command
 
-To make everyone's life easier, we have provided a simple way for install, simply run `sh install.sh`.
+To make life easier, we provide a simple way to install with `sh install.sh`.
 
 This script does everything mentioned in the next section, plus creating a virtualenv. Use `source venv/bin/activate` to activate.
 
 #### Install manually
 
-Generally it's recommended to create a Python3 virtualenv and work under it.
-
-You need to first install AllenAI's bilm-tf package by running `python3 setup.py install` in ./bilm-tf directory
-
-Then install requirements by `pip3 install -r requirements.txt` in project root
-
-Then you need to download all the data/model files. There are two steps in this:
-* in bilm-tf/, download [model.zip](http://cogcomp.org/Data/ccgPapersData/xzhou45/zoe/model.zip), and uncompress
-* project root, download [data.zip](http://cogcomp.org/Data/ccgPapersData/xzhou45/zoe/data.zip), and uncompress
-
-Then check if all files are here by `python3 scripts.py CHECKFILES` or `python3 scripts.py CHECKFILES figer`
-in order to check figer caches etc.
+See wiki [manual-installation](https://github.com/CogComp/zoe/wiki/Manual-Installation)
 
 ### Run the system
 
-Currently you can do the following:
+Currently you can do the following without changes to the code:
 * Run experiment on FIGER test set (randomly sampled as the paper): `python3 main.py figer`
 * Run experiment on BBN test set: `python3 main.py bbn`
 * Run experiment on the first 1000 Ontonotes_fine test set instances (due to size issue): `python3 main.py ontonotes`
 
-It's generally an expensive operation to run on new sentences, but you can still do it.
-Please refer to `main.py` to see how you can test on your own data. 
-
-## Engineering details
-
-### Structure
-
-The package is composed with 
-
-* A slightly modified ELMo source code, see [bilm-tf](https://github.com/allenai/bilm-tf)
-* A main library `zoe_utils.py`
-* A executor `main.py`
-* A script helper `script.py` 
-
-### zoe_utils.py
-
-This is the main library file which contains the core logic.
-
-It has 4 main component Classes:
-
-#### `EsaProcessor`
-
-Supports all operations related to ESA and its data files. 
-
-A main entrance is `EsaProcessor.get_candidates` which given a sentence, returns 
-the top `EsaProcessor.RETURN_NUM` candidate Wikipedia concepts
-
-#### `ElmoProcessor`
-
-Supports all operations related to ElMo and its data files.
-
-A main entrance is `ElmoProcessor.rank_candidates`, which given a sentence and a list 
-of candidates (generated from ESA), rank them by ELMo representation cosine similarities. (see paper)
-
-It will return the top `ElmoProcessor.RANKED_RETURN_NUM` candidates.
-
-#### `InferenceProcessor`
-
-This is the core engine that does inference given outputs from the previous processors.
-
-The logic behind it is as described in the paper and is rather complicated. 
-
-One main entrance is `InferenceProcessor.inference` which receives a sentence, outputs from 
-previously mentioned processors, and set inference results.
-
-#### `Evaluator`
-
-This evaluates performances and print them, after given a list of sentences processed by
-`InferenceProcessor`
+Additionally, you can run server mode that initializes the online demo with `python3 server.py`
+However, this requires some additional files that's not provided for download yet.
+Please directly contact the authors.
 
-#### `DataReader`
+It's generally an expensive operation to run on large numerb of new sentences, but you are welcome to do it.
+Please refer to `main.py` and [Engineering Details](https://github.com/CogComp/zoe/wiki/Engineering-Details) 
+to see how you can test on your own data. 
 
-Initialize this with a data file path. It reads standard json formats (see examples)
-and transform the data into a list of `Sentence`
 
 ## Citation
 See the following paper: 

diff --git a/frontend/index.html b/frontend/index.html
@@ -218,7 +218,23 @@
             alert("You must enter a sentence to proceed.");
             return;
         }
-        var tokens = sentence.trim().split(" ");
+        let xhr = new XMLHttpRequest();
+        xhr.open("POST", SERVER_API + "annotate_token", true);
+        xhr.setRequestHeader("Content-Type", "application/json");
+        xhr.onreadystatechange = function () {
+            if (xhr.readyState === XMLHttpRequest.DONE && xhr.status === 200) {
+                var json = JSON.parse(xhr.responseText);
+                continueGenerateTokens(json);
+            }
+        };
+        var data = JSON.stringify({
+            sentence: sentence,
+        });
+        xhr.send(data);
+    }
+
+    function continueGenerateTokens(result) {
+        var tokens = result["tokens"];
         document.getElementById("total-token-num").innerText = String(tokens.length);
         for (var i = 0; i < tokens.length; i++) {
             var curToken = tokens[i];
@@ -262,6 +278,16 @@
         document.getElementById("using-preset-example").innerText = String(-1);
     }
 
+    function getTokens() {
+        var parent_div = document.getElementById("token-display");
+        var i;
+        var tokens = [];
+        for (i = 0; i < parent_div.children.length; i++) {
+            tokens.push(parent_div.children[i].innerHTML);
+        }
+        return tokens;
+    }
+
     function generatePresetMentions() {
         var sentence = document.getElementById("sentence-input").value;
         var xhr = new XMLHttpRequest();
@@ -274,7 +300,7 @@
             }
         };
         var data = JSON.stringify({
-            tokens: sentence.trim().split(" "),
+            tokens: getTokens(),
         });
         xhr.send(data);
     }
@@ -504,7 +530,7 @@
             };
             var data_vec = JSON.stringify({
                 index: i,
-                tokens: sentence.trim().split(" "),
+                tokens: getTokens(),
                 mention_starts: [mention_starts[i]],
                 mention_ends: [mention_ends[i]],
             });
@@ -521,7 +547,7 @@
             };
             var data_simple = JSON.stringify({
                 index: i,
-                tokens: sentence.trim().split(" "),
+                tokens: getTokens(),
                 mention_starts: [mention_starts[i]],
                 mention_ends: [mention_ends[i]],
             });
@@ -538,7 +564,7 @@
             };
             var data = JSON.stringify({
                 index: i,
-                tokens: sentence.trim().split(" "),
+                tokens: getTokens(),
                 mention_starts: [mention_starts[i]],
                 mention_ends: [mention_ends[i]],
                 mode: getInferenceMode(),
@@ -634,7 +660,7 @@
 
     function getExampleSentenceMention(id) {
         if (id == 1) {
-            return [[0, 2], [10, 12], [15, 17]];
+            return [[0, 2], [11, 13], [16, 18]];
         }
         if (id == 2) {
             return [[0, 1], [5, 7], [9, 11], [20, 21]];

diff --git a/install.sh b/install.sh
@@ -1,12 +1,39 @@
 #!/bin/bash
 
+if ! [ -x "$(command -v java)" ]; then
+    echo 'Error: Java in not installed.'
+    exit 1
+fi
+if ! [ -x "$(command -v mvn)" ]; then
+    echo 'Error: maven is not installed.'
+    exit 1
+fi
+if ! [ -x "$(command -v python3)" ]; then
+    echo 'Error: python 3.x is not installed.'
+    exit 1
+fi
+if ! [ -x "$(command -v virtualenv)" ]; then
+    echo 'Error: virtualenv is not installed.'
+    exit 1
+fi
+if ! [ -x "$(command -v wget)" ]; then
+    echo 'Error: wget is not found. Either install or find replacement and modify this script.'
+    exit 1
+fi
+if ! [ -x "$(command -v unzip)" ]; then
+    echo 'Error: unzip is not found. Either install or find replacement and modify this script.'
+    exit 1
+fi
+echo 'All dependencies satisfied. Moving on...'
+
 virtualenv -p python3 venv
 cd ./bilm-tf
 ../venv/bin/python3 setup.py install
 wget http://cogcomp.org/Data/ccgPapersData/xzhou45/zoe/model.zip
 unzip model.zip
 rm model.zip
 cd ../
+venv/bin/pip3 install Cython
 venv/bin/pip3 install -r requirements.txt
 wget http://cogcomp.org/Data/ccgPapersData/xzhou45/zoe/data.zip
 unzip -n data.zip

diff --git a/requirements.txt b/requirements.txt
@@ -5,6 +5,5 @@ scipy
 regex
 Flask
 flask-cors
-cython
 ccg_nlpy
 gensim
-Original file line number
+Diff line change
@@ Expand Up / @@ -5,6 +5,5 @@ scipy @@
     regex
     Flask
     flask-cors
-    cython
     ccg_nlpy
     gensim