Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tdl from grammar #13

Open
wants to merge 43 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
6f9bfed
parse tdl from grammar file, closes #2; allow both lkb and ace gramma…
fcbond Jun 7, 2020
7b52357
made the cgi scripts run (only) on python3; closes #11
fcbond Jun 8, 2020
23b49e7
added a requirements.txt; closes #6
fcbond Jun 8, 2020
ff3dd4f
properly munge docstring from pydelphin; closes #10
fcbond Jun 9, 2020
39c98ef
trouble shooting advice
fcbond Jun 10, 2020
1395921
Merge branch 'tdl-from-grammar' of https://github.com/fcbond/ltdb int…
fcbond Jun 10, 2020
749f6be
Fixed bad print substitution
fcbond Jun 12, 2020
47b1faa
Merge branch 'tdl-from-grammar' of https://github.com/fcbond/ltdb int…
fcbond Jun 12, 2020
ed8dfa8
added options for no gold profiles and extra lisp
fcbond Jun 12, 2020
90f931a
added a chance to add extra lisp before loading script, better loggin…
fcbond Jun 13, 2020
6789f5e
Added description of the new command line options
fcbond Jun 13, 2020
2cfd13f
added more on how it was made
fcbond Jun 13, 2020
f6d31ff
updated for python3
fcbond Jun 22, 2020
c5281ac
several small fixes for robustness
fcbond Jun 23, 2020
bc4de8b
added some quotes to make make-ltdb.bash more robust
fcbond Jun 28, 2020
2223789
fixed lemma search to show examples
fcbond Jul 15, 2020
4b68ac0
read from ace config file
fcbond Jul 15, 2020
caa5341
added pkzip dependency
fcbond Jul 15, 2020
14ecfb5
Added dependency on sqlite3
fcbond Jul 15, 2020
5f8da7f
Update README.rst
fcbond Jul 15, 2020
ef70bd0
Added troubleshooting notes for Ubuntu 18.04 python2/python3 issues
fcbond Jul 15, 2020
2696aa7
added brief explanation of the params file
fcbond Jul 17, 2020
d2258d9
no need for orderedDict in python > 3.7, so changed to just DICT
fcbond Jul 9, 2022
a3a4ea3
make a list, then join it
fcbond Jul 9, 2022
1aefdea
use dict instead of defaultdict
fcbond Jul 9, 2022
e5b415f
quoted many things I should have
fcbond Jul 9, 2022
50985d4
properly encode URLS, closes #20
fcbond Jul 13, 2022
a733fe3
describe assumptions about directory structure, use of virtual enviro…
fcbond Jul 13, 2022
9450dae
maked gold2db more robust for entries with unconvertible MRS
fcbond Jul 14, 2022
abc0583
remove extra space
fcbond Jul 14, 2022
8ea5a14
quote things a little better
fcbond Jul 14, 2022
2b986ce
get the unique constraint right
fcbond Jul 14, 2022
b695d43
pass paramters as needed
fcbond Jul 14, 2022
08e5462
add a rudimentary search for predicates
fcbond Jul 14, 2022
eeeb65d
changed DELPH-IN wiki links to point to github, closes #15
fcbond Jul 14, 2022
69c80b9
even robuster parsing of gold profiles
fcbond Jul 14, 2022
6c9ba70
linked grammar homepage to new wiki; really closes #15
fcbond Jul 19, 2022
b17dfa3
make display of rules robust to missinf kara/made
fcbond Jul 19, 2022
136efff
make Dan happy
fcbond Jul 26, 2022
1336380
better quotes in bash file
fcbond Jul 26, 2022
ab5057c
make sure you list the extra lisp
fcbond Jul 26, 2022
47ce4fc
log warnings in GOLD conversion
fcbond Jul 27, 2022
cea2fe3
if orth-path is not defined, use STEM, document this
fcbond Jul 27, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 82 additions & 18 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,50 @@ documentation in the grammar, a kind of literate programming.
There is `more documentation <http://moin.delph-in.net/LkbLtdb>`__ at
the DELPH-IN Wiki.


LTDB assumes that the grammar follows the usual DELPH-IN conventions,
in particular that there is a grammar directory with sub directories
for ace and lkb config files.

``
grammar/ace/config.tdl
grammar/lkb/script
``

If your `orth-path` is not `STEM` then you must have it defined in the
**top** ace config file, we do not follow includes for config files (yet).

--------------

Usage
-----

1. Run ``./make-ltdb.bash --grmdir /path/to/grammar``
0. Prepare the local environment
``
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
pip install -r requirements.txt
``

1. Run ``./make-ltdb.bash --script /path/to/grammar/lkb/script``

or (somewhat experimental but gets more docstrings)

2. Run ``./make-ltdb.bash --acecfg /path/to/ace/config.tdl``

3. Add extra lisp to call before the script
``./make-ltdb.bash --lisp '(push :mal *features*)' --script /path/to/grammar/lkb/script``

4. You can tell it to just read the grammar, not gold (mainly useful for debugging)
``./make-ltdb.bash --acecfg /path/to/ace/config.tdl --nogold``

You can load from lisp and ace versions of the grammar, it will try to merge information from both.

.. code:: bash

./make-ltdb.bash --grmdir ~/logon/dfki/jacy
./make-ltdb.bash --script ~/logon/dfki/jacy/lkb/script
./make-ltdb.bash --acecfg ~/logon/dfki/jacy/ace/config.tdl

Everything is installed to ``~/public_html/``

Expand All @@ -33,28 +67,27 @@ Requirements

::

* python 2.7, python 3, pydelphin, docutils, lxml
* python 3, pydelphin, docutils, lxml
* Perl
* SQLite3
* Apache
* LKB/Lisp for db dump
* xmlstarlet for validating lisp

We prefer that Sentence IDs are unique, if we see two sentences in the
gold treebank with the same ID, we only store the first one.
We store items as (profile, item-id) pairs, so Sentence IDs do not
need to be unique.

Only the new LKB-FOS (http://moin.delph-in.net/LkbFos) suppoorts the new docstring comments. We assume it is installed in
Only the new LKB-FOS (http://moin.delph-in.net/LkbFos) supports the new docstring comments. We assume it is installed in
``LKBFOS=~/delphin/lkb_fos/lkb.linux_x86_64``.

Install dependencies (in ubuntu):

.. code:: bash

sudo apt-get install apache2 xmlstarlet
sudo apt-get install python-docutils python3-docutils python3-lxml
sudo apt-get install apache2 xmlstarlet p7zip sqlite3
sudo apt-get install python3-docutils python3-lxml

sudo pip install pydelphin --upgrade
sudo pip3 install pydelphin --upgrade
pip install pydelphin --upgrade

Enable local directories in Apache2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -66,7 +99,7 @@ This may be different on different operating systems
sudo a2enmod userdir
sudo a2enmod cgi

Put this in ``/etc/apache2/sites-available/000-default.conf``
Put this at the end of ``/etc/apache2/sites-available/000-default.conf``

.. code:: xml

Expand Down Expand Up @@ -99,11 +132,44 @@ If the LKB complains
it probably means you have a docstring in an instance file, or an old
version of the LKB. Make sure you only document types for now.

If you are having trouble with apache encodings, set the following in ``/etc/apache2/apache2.conf``

::

SetEnv PYTHONIOENCODING utf8

To make debugging

On Ubuntu 18.04, to get python3 modwsgi working if you have updated from an earlier version (so your python defaults to 2.7) do this

.. code:: bash

sudo apt-get install libapache2-mod-wsgi-py3
sudo update-alternatives --install /usr/bin/python python /usr/bin/python2.7 1
sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.6 2

Links go to the wrong place
---------------------------

ltdb assumes that the code is being served from a machine whose name
is ``hostname -f`` using ``http`` in your ``public_html``. If that is not true, e.g. you
want to change the host, or port or use https, then please change the
appropriate parts of ``params``.

.. code:: bash

charset=utf-8
dbroot=/home/bond/public_html/cgi-bin/ERG_mal_mo
db=/home/bond/public_html/cgi-bin/ERG_mal_mo/lt.db
cssdir=http://mori/~bond/ltdb/ERG_mal_mo
cgidir=http://mori/~bond/cgi-bin/ERG_mal_mo
ver=ERG_mal_mo



Todo
----

- check I am getting lrule/irule right

--------------

Types, instances in the same table, distinguished by status.
Expand All @@ -114,15 +180,15 @@ Types, instances in the same table, distinguished by status.
+==========+====================================+===================+======+
|type |normal type | | |
+----------+------------------------------------+-------------------+------+
|ltype |lexical type |type + in lexicon | _lt |
|lex-type |lexical type |type + in lexicon | _lt |
+----------+------------------------------------+-------------------+------+
|lex-entry |lexical entry | | _le |
+----------+------------------------------------+-------------------+------+
|rule |syntactic construction/grammar rule | LKB:\*RULES | _c |
+----------+------------------------------------+-------------------+------+
|lrule |lexical rule | LKB:\*LRULES | lr |
|lex-rule | lexical rule | LKB:\*LRULES | lr |
+----------+------------------------------------+-------------------+------+
|irule |inflectional rule | LKB:\*LRULES + | ilr |
|inf-rule |inflectional rule | LKB:\*LRULES + | ilr |
+----------+------------------------------------+-------------------+------+
| | (inflectional-rule-pid )| | |
+----------+------------------------------------+-------------------+------+
Expand Down Expand Up @@ -153,5 +219,3 @@ Types, instances in the same table, distinguished by status.
+--------+--------------------------------------+
| ◬ | Binary, Non-Headed |
+--------+--------------------------------------+

FIXME: add IDIOMS as a different table
6 changes: 6 additions & 0 deletions ToDo
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
* look at lisp with John
* prettier lisp
* hyperlinked types
* types without glb


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal todo lists don't serve much purpose in the repository. Actionable things can be made into issues, and personal notes can be left in unversioned files

* Better linking to surface form


Expand Down
Loading