-
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathTODO
2038 lines (1924 loc) · 117 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Huge list of notes and TODOs (complete, partially complete, and incomplete)
Bugs and Features
Conventions / Legend:
status: "-" means not done, "~" means in progress, "+" means done, "?" means not sure, "x" means canceled, "<" means future
type: "bug" means broken, "feature" means would be cool
summarize:
- things to do
grep "[-~] \(bug\|feature\):" TODO | less -iSEX
- api:
- gui:
- next targets:
grep ⊚ TODO | tail -n+2 | less -iSEX
Releasing a version
# mvn release:perform -DreleaseProfiles=src,javadoc -Darguments=-Dgpg.passphrase=PASSPHRASE
mvn -DpushChanges=false release:prepare
* key to getting this working was bumping maven-release-plugin to 2.3.2 (was getting 2.0 as could be verified with mvn help:effective-pom)
- updates poms from snapshot to a solid version
- commits
- updates poms to next snapshot version
- commits
https://docs.sonatype.org/display/Repository/Sonatype+OSS+Maven+Repository+Usage+Guide
Yawni Subprojects
+ api
+ data
- should there be 1 per WN version? (..., 2.1, 3.0, ..., etc.)
+ browser
- depends on core
- build standalone (aka dist jar)
mvn -Dmaven.test.skip=true -PuseYawniData,useLog4j,makeShadedJar clean package
mvn -Dmaven.test.skip=true -PuseYawniData,useLog4j,makeSignedJar,makeShadedJar clean package (9.6 MB target/signed/yawni-wordnet-browser-2.0.0-SNAPSHOT.jar)
Apps of WordNet
- WordNet supersenses (WNSS) (Ciaramita and Altun, 2006)
Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger
https://sourceforge.net/projects/supersensetag/
- Java port "underway" by Jordi Atserias | Yahoo! Research (sourceforge user: batalla)
- https://supersensetag.svn.sourceforge.net/svnroot/supersensetag/branches/SSTjava/
- Apache Jackrabbit includes Sandbox SynonymProvider using org.apache.lucene.index.memory.SynonymMap
jackrabbit/sandbox/wordnet-synonyms/src/main/java/org/apache/jackrabbit/core/query/wordnet/WordNetSynonyms.java
- Using the 'Alignment API' with Wordnet
https://alignapi.gforge.inria.fr/alignwn.html
Bob Carpenter's Stemming Examples (https://lingpipe-blog.com/2009/02/25/stemming-morphology-corpus-coding-standards/)
* odious: odious ? probably means "odium" or could mean it has no further stem ?
* euphoria: euphoria (CELEX-2 stems “euphoric” to “euphoria”)
* OK mathematical: mathematics
* epidemiology: epidemic (CELEX: missing) (WordNet: epidemiologic, epidemiological, epidemiologist - no idea how to get to epidemic)
* hypocrisy: hypocrisy (CELEX: hypocrite) (WordNet: hypocritical)
* OK maximize: maximum
Affixes
* unmodernised: modernised (CELEX: missing) (? antonym)
* OK prearrangement: prearrange
* OK unlocking: unlock (CELEX: missing)
* OK incorrigible: corrigible (? antonym)
* OK disability: disable
* OK inconvenience: inconvenient (CELEX: convenient)
* OK resentencing: sentencing (CELEX: missing) (* tricky - "re-sentencing")
* overcollateralization: collateralization (CELEX: missing)
Compounds
* OK cockfighting: cockfight (CELEX: cock fighting) (gloss)
* SPACE headquarters: head quarters
* ultraleftist: ultra leftist (CELEX: missing)
* OK omnipotence: omnipotent
* steelmaking: steel making (CELEX: missing) (WordNet: steelmaker has gloss "...making steel")
* SPACE stockholder: stock holder
* signalmen: signalman (-)
* OK DASH weatherbeaten: weather beaten (CELEX: weather-beaten) (tricky: dash)
* SPACE supercomputer: super computer (CELEX: missing)
* newsgathering: news gathering (CELEX: missing)
? why does this web UI to WN 3.0 https://poets.notredame.ac.jp/cgi-bin/wn
show "microorganism" and "infectious agent" as Pertainyms of "viral" ?
- these are both immediate hypernyms of the derivationally related source noun "virus"
+ are derivationally related forms of adjectives "Pertainyms" ? no
+ is this just for "sense context" ? -- appears so, just missing level of indent to show this
- WordNet data omissions
- derivational link from "employ" → "employable"
- derivational link from "recursion" → "recusrive"
- singular form "french fry" (has "french-fry" and "french fries")
- exceptional spelling for "blogging" (double-'g' form missing)
Release TODOs
- wild idea:
- do per-RelationType customized behaviors in getRelations() e.g., VERB_GROUP,
and maybe MERONYM, HOLONYM, DOMAIN
! fail: at least for Word, VERB_GROUP would need List<List<RelationArgument>> since it clumps
- wilder idea: make a MetaSynset/VerbGroup object
- synset groups ; simple pair (Relation) usually suffices
verb VERB_GROUP
adj SIMILAR_TO
- connects satellite adjectives to their head and vice-versa
- what clever things could I do with JavaScript integration ?
- probably want some little higher-level accessors ?
- add a debug terminal
? how would errors be reported ?
- maybe want some default includes
- if fast-boot be desirable, make FileManager.getFileStream() impl configurable ?
- for some reason, Apple JVM doesn't include Rhino?
- can be fetched from: https://repo1.maven.org/maven2/rhino/js/
<dependency>
<groupId>rhino</groupId>
<artifactId>js</artifactId>
<version>1.7R2</version>
</dependency>
// - do a Lucene Tokenizer / Analyzer integration - good performance test best
// - use tikluc demo
// - indexed reuters docs: https://192.168.1.66:8880/skwish/meta?id=10500
//
// org.apache.lucene.analysis.TeeSinkTokenFilter.SinkFilter
// - seems to be modern way to "hook into" Token stream
// - related Lucene contrib examples
// - org.apache.lucene.analysis.shingle.ShingleFilter
// - constructs shingles (token n-grams) from a token stream, i.e., creates combinations of tokens as a single token.
// - org.apache.lucene.analysis.sinks.DateRecognizerSinkFilter
// - org.apache.lucene.analysis.compound.
// - decomposes 1 token into more than 1 (Germanic languages)
//
// - use Lucene StandardTokenizer (JFlex: fast!)
// - use shingle generator to create alternatives
// - what pattern does it generate the various shingle lengths in ? (5's, 4's, 3's, 2's, 1's would be ideal :))
// - don't ultimately, only emit 1 token per position
// - hyperlinking the glosses in the browser would be a good way to exercise this
// - basically, factors out tokenization and exercises Lucene plumbing
// - learn new Lucene Token/Tokenizer/TokenStream/TokenFilter
// - AttributeSource
// - TokenStream
// - Tokenizer (input isa Reader)
// - TokenFilter (input isa TokenStream) (Uwe says decorator pattern: TokenFilter adds functionality to a Tokenizer)
// - TeeSinkTokenFilter.SinkTokenStream
// - NumericTokenStream
// - "Attributes instead of fields of Tokens"
// - AttributeImpl implements Attribute
// - Token (implements Attribute, FlagsAttribute, OffsetAttribute, PayloadAttribute, PositionIncrementAttribute, TermAttribute, TypeAttribute)
// - AttributeSource container of AttributeImpls
- can DFS Iterator be easily created with AbstractIterator with no explicit stack ?
- i.e., with a constant amount of extra space
- https://lingpipe-blog.com/2009/01/27/quiz-answer-continuation-passing-style-for-converting-visitors-to-iterators/
- Ted Pederson's WordNet stop list; "normally used as function words"
I, a, an, as, at, by, he, his, me, or, thou, us, who
- missing 'it'; 'his' not in WordNet 3.0
- Paul R. Dunn's stop list; Christiane Fellbaum adds that they are often ADV
(author of Visuwords: dunnbypaul.net
- ActionScript / Flash
- PHP
)
the
this
that
those
these
a
and
or
nor
because
whenever
whereas
unless
if
than
has
am
are
is
do
does
doth
bring
had
was
did
got
took
brought
were
been
seen
? taken - irregular
cannot
to
for
with
without
of
from
against
into
upon
toward
since
until
I
me
you
he
him
she
her
it
we
us
they
them
everybody
everyone
anyone
oneself
myself
yourself
himself
herself
itself
ourselves
yourselves
themselves
my
mine
your
yours
his
her
hers
its
our
ours
their
theirs
which
what
shall
could
would
should
ought
everything
? children - irregular
- https://en.wikipedia.org/wiki/File:Flag_of_the_United_States.svg
- interesting JavaDoc paragraph from JComboBox docs (references package-summary.html)
* <p>
* <strong>Warning:</strong> Swing is not thread safe. For more
* information see <a href="package-summary.html#threading">Swing's Threading Policy</a>.
- Lucene docs should use this since they have awesome package-level docs
https://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/analysis/package-summary.html
org.apache.lucene.analysis
- JIDE has some awesome (feel) open source components; auto-complete, IntelliHints, Searchable (aka FindAsYouType), and SearchableBar (like Firefox), JideSplitButton: PopdownButton with menu; license looks tricky
- WordNet online for inspiration
- https://www.yawni.org/wiki/Main/GettingStarted
- code inspiration from https://aboisvert.github.com/stopwatch/
+ bug: api: issues with checking out and building yawni
+ data sub-project currently needs my jar signature
+ api depends on data
- couldn't figure out how to make this dependency <optional>; maybe no project-internal concept exists
+ uncommented WNHOME system property
+ got burned by WordNet version (2.1 vs. 3.0)
- then got burned by WNSEARCHDIR still referencing old version
~ feature: online: web interface written in Scala + Lift, executing Google App Engine (GAE)
- re-write in JAX-RS: simpler, more features
Scalate Jog: JAX-RS + Scala + Guice
/Users/nezda/Library/Application\ Support/TextMate/Bundles/Scalate.tmbundle
/Users/nezda/Library/Application\ Support/TextMate/Bundles/scala.tmtheme
/Users/nezda/code/scala/scalate-1.2/examples/scalate-sample
* problems with mvn scala:cc
- https://github.com/davidB/maven-scala-plugin/issues/issue/36
℞ prescription take
ℹ information source
Å angstrom sign
℉ degrees Fahrenheit
℃ degrees Celsius
℅ care of
ℇ euler constant
℮ estimated symbol
ℯ script small e
ℰ script capital e
ⅇ double-struck italic small e
№ numero sign
℀ account of
ℊ
℗
⅄
๏•⦾⦿๏⸰⦁‣
+ appid (AppID) candidates ("yawni" not valid, "wordnet" not available)
+ yawni-online
x yawnionline
x web-yawni
x yawniweb
x webyawni
x looking for trouble: rename it wordnetnik :)
+ mvn archetype:generate -U \
-DarchetypeGroupId=net.liftweb \
-DarchetypeArtifactId=lift-archetype-blank \
-DarchetypeVersion=2.0-M1 \
-DremoteRepositories=https://scala-tools.org/repo-releases \
-DgroupId=org.yawni \
-DartifactId=yawni-wordnet-online \
-Dversion=2.0.0-SNAPSHOT
- gae details
- update online version:
./appengine-java-sdk/bin/appcfg.sh --enable_jar_splitting update online/target/yawni-wordnet-online-2.0.0-SNAPSHOT
can tweak GAE "version" in WEB-INF/appengine-web.xml ; previous versions remain available and web accessisble! e.g., https://5.latest.yawni-online.appspot.com/rest.html
./appengine-java-sdk/bin/appcfg.sh --enable_jar_splitting update rest/target/yawni-wordnet-rest-scala-2.0.0-SNAPSHOT
- online version features
- autocomplete
- show 'hint' text : conflicts with "default focus" behavior :/
- fade-in/fade-out of suggestions would be slick (like https://loopj.com/tokeninput/demo.html)
- support tab / shift-tab navigation like TagDragon (and Firefox search widget)
- decrease lag from 400ms to less
- should indicate no known matches; could also try 1 additional search using morphological processing
- suggesting near-hits would be sweet
+ gae error: "sun.misc.SoftCache is a restricted class"
+ need to use alternate softcache impl
+ Google Collections: ConcurrentMap<K, V> map = MapMaker.softValues().makeMap();
- feature: online: update appengine-web.xml <version> automatically with Maven; then Google App Engine will version our app !
- yawni.org color scheme:
+ orange variant #F79910
- CSS
+ font-size: 16px
+ line-height: 24px
- other typography enhancements
- color - get inspiration from wordnetweb
- smart double quotes
- smart single quotes for single quoted items: `foo' → ‘foo’; 's → ’s
- bullet between examples ; • (using semicolon)
x "space" bullet (\u00B7 MIDDLE DOT) between words of collocations ; wild-goose·chase ; attack·dog ; hard·cash
- only occasionally see this used, and often only for phonetic separation
+ italicize examples; gray text (use BluePrint CSS class alt)
- em dash where apropos
- rounded corners:
- this site uses rounded corners a lot: https://scala-tools.org/mvnsites/maven-scala-plugin/
-moz-border-radius: 14px 14px 14px 14px !important
- Tapestry has neat REST service for this: https://tapestry.apache.org/tapestry4.1/developmentguide/hivemind/roundedcorners.html
- use div class showgrid to see the BluePrint grid
<span style="padding: 14px; -moz-border-radius: 14px 14px 14px 14px ! important; background: none repeat scroll 0% 0% white;">
<input>
</span>
<p/>
<div id="resultz" style="-moz-border-radius: 14px 14px 0px 0px ! important; background: none repeat scroll 0% 0% white; padding: 14px;">
</div>
- design
+ orange background #F79910 (used ColorPicker to get this perfect)
+ white capsule around search box
+ downward facing rounded corners on results
+ issues
- sense-horizontal-separating bullets are too big (Christine)
+ AJAX active spinner in unusual place
- maybe closer to search box
+ AJAX active spinner has white background which looks weird on non-white background
x make it transparent - better to match background colors
+ https://www.ajaxload.info/
+ rounded corners should work in Safari too
+ add search button, if only for aesthetics (searchField activates onBlur anyway)
+ add footer (copyright, etc.)
~ links to/from yawni.org
+ meta keywords: WordNet
- add recent searches on right hand side
- don't want REST API searches to be shown here
- REST API can be totally stateless (see Lift example StatelessHtml.scala)
- David Pollack's advice
- serialize the forms to JSON and then make an AJAX call on a stable URL
- new Req.json features will allow easier extraction of JSON from a REST call
- this combined with disabling the Lift heartbeat should give you stateless support for JSON
- support different versions of WordNet :)
- should we vary yawni-wordnet-data project version or artifactName ? (or both?)
- bugs
+ aritfact name is still gae-2.0.0-SNAPSHOT.war ; change to yawni-online-2.0.0-SNAPSHOT.jar
+ timed out issues
- empty responses returned from timed out sessions
- entry to "waiting" page is ignored; returns empty result !?
! likely only occurs on GAE and is caused by 1 request being serviced by multiple JVMs; LiftSession is not fully stored in servlet session, so this clustered servicing fails to behave correctly
- rounded corners won't work on IE (5,6,7,8)
https://www.w3schools.com/browsers/browsers_stats.asp
? what about Chrome & Opera ?
+ SLF4J impl collision
+ AJAX page seems to reset when page is left and then revisited
x searchField focus issue
x main evidence is from appspot version of app
x could local and appspot versions running in different tabs be interacting (via cache or otherwise ?)
+ intentional effect of Lift (Session) Garbage collection (heartbeat driven): can be disabled, but not a good idea
LiftRules.enableLiftGC = false
+ AJAX active spinner not showing
x something missing in Boot ?
+ src/main/webapp/images/ajax-loader.gif was missing
+ still doesn't work: had to add default hidden div to default.html
+ Blackbird widget not working (yuicompressor-maven-plugin ?)
+ various problems with xhtml and this widget
+ browser scroll bar causes jarring left/right shift
html { overflow-y: scroll; }
https://www.w3.org/TR/CSS21/visufx.html#overflow
- custom 404 page
https://wiki.liftweb.net/index.php/Setting_up_a_custom_404_page
- tests
- Lift example uses several test libraries: 'specs', 'mockito', jwebunit-htmlunit
- Selenium tests
- load tests : JMeter
- https://code.google.com/p/perfbench/source/browse/trunk/perfbench
- Tapestry on Google App Engine ?
- feature: api: alternate an all caps WordNet term with one with periods after each letter so "CEO" matches input "C.E.O."
- this article has such an example of "C.E.O.": https://www.nytimes.com/2008/11/23/magazine/23Netflix-t.html?_r=2&pagewanted=all
- feature: api: expose WordNet data version (e.g., 3.0)
- feature: api: expose About dialog diagnostics as Map (code currently inlined in Browser.java; some of this info in browser.Application)
- maybe should have Application or Module or something in every subproj
+ feature: api: before initial public release, get names right; rename package org.yawni.wn → org.yawni.wordnet ; rename groupId org.yawni.wn → org.yawni; rename artifactId yawni-core → yawni-wordnet-api; rename artifactId browser → yawni-wordnet-browser
+ feature: api: before initial public releease, get names right; rename RelationTarget → RelationArgument (less confusing)
~ feature: api: ⊚ incorporate other related lexico-semantic resources
- new Princeton "Standoff Files"
+ The Morphosemantic Database
- adds 1 of 14 semantic types to the existing DERIVATIONALLY_RELATED WordNet relation
example line:
<arg1 (verb) sensekey $1> <arg1 offset $2> <relationType $3> <arg2 sensekey $4> <arg2 offset $5> <arg1 gloss (abbreviated)> <arg2 gloss (abbreviated)>
cannibalize%2:34:00:: 201162291 agent cannibal%1:18:00:: 109891079 eat human flesh a person who eats human flesh
- 17,739 row; 16,995 sort-uniq'd by sense keys
* note "offset" has pos ordinal prepended; e.g., in "201162291", leading digit ("2") means VERB, rest ("01162291") is ("cannibalize") synset offset
- note: synset offset gets search to right synset, lemma from senseKey required to get to right WordSense
* distill down to <arg1 offset> <relation> <arg2 offset>; rest is for annotator convenience
# sense keys and offsets: verb → noun; 1,077,470 B / 218,758 B gzip'd
cat morphosemantic-links.xls.tsv | awk -F'\t' '{print $1 " " $2 " " $3 " " $4 " " $5}' | sort | uniq | gzip - | wc -c
# sense keys and offsets: verb → noun AND noun → verb; 2,154,940 B / 324,103 B gzip'd
cat morphosemantic-links.xls.tsv | awk -F'\t' '{print $1 " " $2 " " $3 " " $4 " " $5 "\n" $4 " " $5 " " $3 " " $1 " " $2}' | sort | uniq | gzip - | wc -c
# offsets and sense keys: verb → noun AND noun → verb; 2,154,940 B / 324,103 B gzip'd
cat morphosemantic-links.xls.tsv | awk -F'\t' '{print $2 " " $1 " " $3 " " $5 " " $4 "\n" $5 " " $4 " " $3 " " $2 " " $1}' | sort | uniq | gzip - | wc -c
# only sensekeys: verb → noun; 737,570 B / 113,004 B gzip'd
cat morphosemantic-links.xls.tsv | awk -F'\t' '{print $1 " " $3 " " $4}' | sort | uniq | gzip - | wc -c
# both offsets, both sense keys: verb → noun AND noun → verb; 2,154,940 B / 324,103 B gzip'd
cat morphosemantic-links.xls.tsv | awk -F'\t' '{print $2 " " $5 " " $1 " " $4 " " $3 "\n" $5 " " $2 " " $4 " " $1 " " $3}' | sort | uniq | gzip - | wc -c
# both offsets, only source senseKey : verb → noun AND noun → verb; 2,154,940 B / 324,103 B gzip'd
cat morphosemantic-links.xls.tsv | awk -F'\t' '{print $2 " " $5 " " $1 " " $3 "\n" $5 " " $2 " " $4 " " $3}' | sort | uniq | gzip - | wc -c
translate pos + offset + senseKey → pos + offset + synsetWordIdx; 1,055,246 B / 256,753 B
100003553 unit%1:03:00:: result 201462005 unify%2:35:00::
30658 unique offset pairs
* 3332 lost
* 33990 unique offset+senseKey pairs
33606 unique offset pairs + source w/ senseKey
* 384 losts
# simplest, most efficient solution is to translate sensekeys to format used in WordNet's data files
# - see Relation parsing in Relation.makeRelation(final Synset synset, final int index, final CharSequenceTokenizer tokenizer)
# - components:
# - synset (offset + POS)
# - 'index' within Synset's list of relations
# - relationTypeOrdinal (parsed)
# - target POS
15562+5902+2710+2407+1686+1588+1440+1012+628+532+226+172+86+34 = -33985 + 33990 = 5 -- no idea where these 5 went
- alternate distillation is <arg1 sensekey> <relation> <arg2 sensekey>
- more readable/verifable
- how to expose MorphosemanticSemanticRelation ?
- enums make subclassing impossible
- closest existing equivalents are pure-virtual types: MERONYM <- { PART_MERONYM, MEMBER_MERONYM, SUBSTANCE_MERONYM } ...
- merging them into RelationType and representing them as Relations easiest for users
- database only contains explicit verb → noun relations
? is inverse implied ?
? some verb → noun instances don't seem to be present ?
src: breathe%2:29:00::
→ breather ()
→ breathing (syn respiration#3 is represented)
src: respire%2:29:02::
→ respiration#1
→ respiration#3
* could automate XLS "parsing"/interpretation with Apache POI
- for easy binary search, duplicate each relation forward and reversed (and overall sorted)
* note these will have to be divided by POS
- 14 different morphosematic relations
8158 event
3043 agent
1439 result
1273 by-means-of
878 undergoer
813 instrument
740 uses
528 state
318 property
288 location
114 material
87 vehicle
43 body-part
17 destination
- The Teleological Database
? presume these are for WordNet 3 ?
<arg1 sensekey> <arg1 offset> <relation> <arg2 sensekey> <arg2 offset>
bomb%1:06:00:: 102866578 action destroy%2:36:00:: 201619929
- 12 different relations
448 action
214 theme
109 result
77 agent
57 location
47 undergoer
25 instrument
23 beneficiary
18 destination
17 cause
10 experiencer
8 source
+ "Core" WordNet
core-wordnet.txt contains the top (i.e., most frequently used) 4,997 senses
example line:
v [clear%2:32:00::] [acquit] clear, pronounce not guilty
cleanup commands
cat core.* | tr "_" " " | tr "[:upper:]" "[:lower:]" | egrep '([^%]+).*\[\1\]' -v | wc -l
cat core-wordnet.txt | colrm 1 3 | sed "s/]//" | tr "_" " " | tr "[:upper:]" "[:lower:]" | egrep '([^%]+).*\[\1\]' -v | wc -l
# add rank as explicit final field
cat 5K.clean.txt | sed -E "s/(\[|\])//g" | awk 'BEGIN{i=1} {print $2 " " i; i++; }' | sort -k1,1 > core-wordnet.ranked
cat 5K.clean.txt | colrm 1 3 | sed "s/]//" | awk 'BEGIN{i=1} {print $0 " " i; i++; }' | sort -k1,1 > core-wordnet.ranked
5K has 3 duplicates
+ a [available%3:00:00::] [available] obtainable
+ a [whole%3:00:00::] [whole] including all components
+ n [shoe%1:06:00::] [shoe] footwear
- https://wordnet.princeton.edu/wordnet/download/
- when were the new "Standoff Files" released ?
- Geo-WN ; toponym disambiguation
- VerbNet
- FrameNet
- SentiWordNet (crappy non-commercial license)
- SUMO: mappings https://sigmakee.cvs.sourceforge.net/sigmakee/KBs/WordNetMappings/
- Collins English Dictionary (CED) old edition available cheaply from LDC ($100); challenging to parse
- Wiktionary; hard to parse; Ninja Words uses this
- LDOCE: Procter, Paul (Ed.). 1978. Longman Dictionary of Contemporary English
- uses a restricted vocabulary of about 2000 words in its definitions and example sentences
- Roget's Thesaurus (old (i.e., 1900) Project Gutenburg?); JWord uses this
- The English Tree of Roots (old (i.e., 1900) Project Gutenburg?); JWord uses this
- GeoSemCor 2.0 (SemCor stands for "Semantic Concordance")
- create tool support for dealing with new data
- expert query modes will be very helpful, e.g., senseKey
- will require API flexibility or expansion
- "soft fields", aka properties, aka name-value pairs: isCore
- generic/dynamic RelationType with a "type" field
- performing mapping/conversion of resources bound to other WordNet versions
e.g., SentiWordNet is tied to WordNet 2.0 (need 2.0→2.1→3.0)
- interesting homonym
weakened vs. weekend
- command to make local browser standalone executable jar
time mci -PuseLog4j,useYawniData,makeShadedJar
- site element notes
+ link to yawni.sourceforge.net (which is linked to yawni.org)
- add WordNet 3.0 documentation since https://wordnet.princeton.edu seems to be down often
+ Google Analytics include
+ name
+ description
+ project icon
~ project features
+ browser features
+ (browser) screenshot
- examples / getting started
- strawman WSD system leveraging sense tagged frequencies, "Core" WordNet, Morphosemantic links, lexCats
- Scala + Yawni = WordNet REPL!
// rlwrap scala -Xnojline
alias wnrepl='scala -cp /Users/nezda/.m2/repository/org/yawni/yawni-wordnet-api/2.0.0-SNAPSHOT/yawni-wordnet-api-2.0.0-SNAPSHOT.jar:/Users/nezda/.m2/repository/org/yawni/yawni-wordnet-data/2.0.0-SNAPSHOT/yawni-wordnet-data-2.0.0-SNAPSHOT.jar:/Users/nezda/.m2/repository/com/google/guava/guava/13.0/guava-13.0.jar:/Users/nezda/.m2/repository/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9:/Users/nezda/.m2/repository/org/slf4j/slf4j-api/1.6.6/slf4j-api-1.6.6.jar:/Users/nezda/.m2/repository/org/slf4j/slf4j-nop/1.6.6/slf4j-nop-1.6.6.jar'
// https://stackoverflow.com/questions/9516567/settings-maxprintstring-for-scala-2-9-repl
- alternative to this manual alias, BUT must specify Xmx via MAVEN_OPTS since can't fork scala console
mvn -Dscala.version=2.9.2 net.alchim31.maven:scala-maven-plugin:3.0.2:console
:power
vals.isettings.maxPrintString = 1000000
import org.yawni.wordnet._
import scala.collection.JavaConversions._
val wn = WordNet.getInstance
// Ctrl+A begin-of-line
// Ctrl+E end-of-line
(1 to 1000000).par.map(_.toString) map(wn.lookupWordSenses(_, POS.ALL)) filter(!_.isEmpty) map(_ map(a => (a, a.getSynset.getLexCategory)) mkString(":")) mkString("\n")
// parallelize :)
val numberMatches = (1 to 1000000).par map(_.toString) map(wn.lookupWordSenses(_, POS.ALL)) filter(!_.isEmpty)
numberMatches map(_ map(a => (a, a.getSynset.getLexCategory)) mkString(":")) mkString("\n")
def timeit(f: => Unit) = { val s = System.nanoTime; f; "%,.4f s" format ((System.nanoTime - s) / 1e9) }
timeit({ val numberMatches = (1 to 1000000).par map(_.toString) map(wn.lookupWordSenses(_, POS.ALL)) filter(!_.isEmpty) })
val singleCharMatches = (1.toChar to 65535.toChar). map(_.toString) map(wn.lookupWordSenses(_, POS.ALL)) filter(!_.isEmpty)
def twoLetterWords(s:Int = 0, e:Int = 127) = (s.toChar to e.toChar).combinations(2).map(_.mkString("")).flatMap(a => List(a, a.reverse))
twoLetterWords() map(w => ("\"%s\"" format w, wn.lookupWordSenses(w, POS.ALL))) filter{ case (_, senses) => !senses.isEmpty} mkString("\n")
- list the atomic elements in WordNet
- list numbers in WordNet
- has Roman numerals too!
? weird that only 1-30 are nouns (with lexCat <quantity>) while of 31-100, only 40,50,60,78,80,90,100 are nouns, rest adjectives
- normalize integers (adjectives: 1st → first, ..., 49th, 50th, 60th,..., 100th, ..., ? 115th, ..., 130th, ...)
- digits, textual, Roman numerals (* lowercase)
* limited fraction synonyms ("one-half" = "half", "one-fourth" = "one quarter" = "twenty-five percent")
- mine for capitalized adjectives and verbs (shows off true case)
- mine nouns for acronyms and their expansions (USA, U.S.A., AN, WMD, WHO); substring search for "\."
- mine nouns for name alternations: {Rockefeller, John D. Rockefeller, John Davison Rockefeller}, {Davis, Miles Davis, Miles Dewey Davis Jr.}
- mine for distinct Words whose lexical form is only different by spaces (e.g., "date line" vs. "dateline")
- mine for gender information (famous person names, etc.)
- easiest things in WordNet:
- single sense words (* with (bad) closed world assumption :))
- words in core WordNet (* not personally used)
- 1st senses, especially those with disproportionate max (always max) of sense tagged frequency
- strawman, ghetto, closed-world-assuming "POS tagger"
- some form of longest match
- use basic stop terms
- ignore initcap terms
- check case ? ("was" != "WA")
~ FAQ
~ JavaDoc
~ maven info
- download link
~ sourceforge project URL
+ sourceforge icon + URL (for tracking)
- Amazon Associates book links
- this site has a link for an Erlang book: https://www.satine.org/archives/2007/12/13/amazon-simpledb/
+ new screenshot of substring search window: https://grab.by/3GWK ; replaces https://grab.by/1Ic4#file.png (before brushed metal fix)
+ donations link / directions / terse plea
- ohloh link
- freshmeat link
~ pertinent pom.xml project facts
+ Apache license link
+ Princeton WordNet license link
+ issue tracking (Trac or Jira)
+ who we are
+ Luke Nezda
+ Oliver Steele
- related projects
- inferior APIs ?
- Princeton WordNet
- Apache UIMA
- has a DictionaryAnnotator
- requires conversion to an XML format (DictionaryCreator commandline tool provided; provides tokenization options)
- includes "advanced muli word capabilities", "case normalization" (i.e., case insensitive)
- "advanced" tokenizer: org.apache.uima.TokenAnnotation // is this jflex-based ?
- consider implementing InputMatchFeaturePath : baseFormToken with Yawni
* consider patch to improve copy of the DictionaryAnnotator documentation
- PDF documentation URL on main page broken
- spurious "content."
- not "partOfSpeach", "partOfSpeech" ; spell check!
- Apache Lucene
- feature: promotion: doc:
- browser
- Substring search is like "-grep" option of C WordNet commandline interface 'wn'
? what's this called in 'wnb' interface
+ Note that it does not query any of the online versions of WordNet
+ Optionally supports WordNet data file versions 1.5, 1.6, 1.7, 2.0, 2.1, 3.0
- ssh to sourceforge site
ssh -t lnezda,[email protected] create
- follow FEST projects' lead: use pmwiki, ask for Atlassin Confluence, Jira, and Fisheye
https://www.pmwiki.org/wiki/Cookbook/SourceForgeServers
+ yawni icon URL
https://yawni.sourceforge.net/yawni_43x48_icon.png
https://www.yawni.org/images/yawni_43x48_icon.png
https://www.yawni.org/images/yawni_57x64_icon.png
+ sourceforge icon link
<img src="https://sourceforge.net/sflogo.php?group_id=36271&type=3" alt="SourceForge.net Logo" class="sflogo" />
+ using Skittlish theme
https://www.pmwiki.org/wiki/Cookbook/Skittlish
+ added clean URLs
* how to edit a pmwiki page
https://yawni.sourceforge.net/pmwiki/PmWiki/DocumentationIndex
https://yawni.sourceforge.net/pmwiki/Site/SiteFooter?action=edit
(:notitle:)
(:notitlegroup:)
(:notabs:)
+ created Yawni pledgie campaign: https://pledgie.com/campaigns/7717
+ tag lines
1. Object-oriented API to the WordNet database of lexical and semantic relationships
2. The commercial-grade WordNet Java API.
3. The exciting, commercial-grade WordNet API.
- descriptions
1. Yawni is a pure Java standalone object-oriented interface to the WordNet database of lexical and semantic relationships. This is the project formerly known as "JWordNet".
2. Yawni is a pure Java object-oriented interface to the WordNet database of lexical and semantic relationships. This is the project formerly known as "JWordNet".
- fix page <title>: currently 'Yawni Main/Home Page'
Yawni / Home
- make favicon
- add copyright info to footer (© 2010)
- enhance indentation of sidebar
- consider adding/using tabs in addition to sidebar
- move search box into header ? (above where tabs would be)
- documentation
- what's WordNet
- who might use WordNet ?
<link rel="shortcut icon" href="https://www.yawni.org/favicon.ico"/>
<link rel="icon" type="image/x-icon" type="image/vnd.microsoft.icon" href="https://www.yawni.org/favicon.ico"/>
- Apache Pivot
- Web Start app probably more compelling than applet
- could add a poll to site
- Would be cool to download executable, single jar form too
- Component Explorer demo could use description area (e.g., Accordion is ...)
- favorite Pivot stuff
- ListButton
- MenuButton
- Accordion
- Form
- ActivityIndicator (OS X indefinite progress / working indicator)
- FileBrowser
- Calendar / CalendarButton
- ColorChooserButton
- issues
- bug: gui: Linux with GTK L&F: GTK+Swing bug causes major focus problems with text fields
? maybe only effects older Gnome versions ? (Centos' Gnome 2.16, but not Ubuntu's 2.30)
- bug: gui: 'release' → verb → Troponyms; first entry only shows word, not Synset
x feature: api: Synset.getRelations() List<Relation> could be represented as Map<RelationType, List<Relation>>
- pro: subsumes getRelations(RelationType)
- cons: Map requires more memory than List ; Relation is already acting as Map.Entry ; Map has no order ; cannot customize (e.g., HYPERNYM implies INSTANCE HYPERNYM)
- feature: api: parse glosses / examples
- option 1: simply break gloss definition(s) part and examples part ; getDefintionChunk() / getExampleChunk() ; approach used by wordnet online
- option 2: break gloss into chunks (?) and examples into chunks ; this functionality is not provided by other libraries
- option 3: swallow exceptions of failed parses; provide both options
- option 4: getGlossesAndExamples()
* the advanced functions are primarily for use in UIs, and could start out in a utility class (e.g., Descriptions)
- bug: api: RelationArgument (WordSense and Synset) getDescription() / getDescription(boolean) / getLongDescription() / getLongDescription(boolean) methods are a little contrived; they are essentially for UIs only; a more flexible impl would allow varying lexfile num, ...
- Describer / Description / Printer / Renderer / java.util.Formatter / java.util.Formattable / java.text.Format / MessageFormat
- java.util.Formatter - uses Formattable to allow customization of "%s" rendering ; can be parameterized some with FormattableFlags, int length, int precision
- could add Formattable to Word, WordSense, Synset, POS, Relation, RelationType, Lexname ; doesn't add much to toString()
- java.util.logging.Formatter - uses LogRecord
- made rough outline in org.yawni.wordnet.browser.Searcher
- feature: api: create commandline tool (à la 'wn') based on Command interface ; consider supporting batch mode or RAFCharStream to minimize init lag
- bug: api: record "Core" missing senseKeys (gvimdiff core active2.sorted)
- feature: api: simple commandline utilizing synsets(query) / wordSenses(query) akin to 'wn' (e.g., -grep{n|v|a|r})
- feature: api: add RelationType support to synsets(query) / wordSenses(query) API à la 'wn' (e.g., -ants{n|v|a|r}, -hypo{n|v}, -hype{n|v})
- feature: api: document advantages of synsets(query) / wordSenses(query) API
+ binary compatible experimental API - doesn't effect those who don't use it
+ reduces API clutter making the overall API smaller & less daunting
- feature: api: can synsets(query) / wordSenses(query) API be cleanly replaced with a builder pattern-based API ?
+ type safe
+ requires no manual parsing
-? necessarily more public API commitments
- feature: api: doc: describe regex term search support
- feature: api: doc: describe orthographic case features ("true case")
- feature: api: doc: describe regex gloss search support
- feature: api: add glob support; translate "*yawn*" → ".*yawn.*"
~ feature: api: lookupWordSenses(String someString, POS pos) ; add unit tests; make examples with it
- feature: api: make stemming optional in lookupSynsets(String someString, POS pos) and lookupWordSenses(String someString, POS pos)
- currently they combine lookupBaseForms(), lookupWord(), and getWordSenses(): make the lookupBaseForms() part optional (but still on by default)
- feature: api: coordinate terms (aka sister) terms; nouns or verbs that have the same hypernym (aka 'parent')
- represent this as a synthetic Relation ?
+ feature: api: add regex support to 'substring search'
+ feature: api: add gloss regex search support ; searchGlossBySubstring
- bug: api: doc: update Word "substring" search to indicate regex capabilities (and dangers: PatternSyntaxException)
- bug: api: doc: update gloss "substring" search to indicate regex capabilities (and dangers: PatternSyntaxException)
+ bug: api: added more tests for CharSequenceTokenizer
+ bug: api: substring search incorrectly searching leading license lines of data files; added test
+ bug: api: invalid - just an oddity; values in verb.exc without hits?; about-shipped ; about-shipping; benempt; bird-dogging
- bug: browser: Maven <profile> inheritance doesn't appear to work correctly: inlining profiles into browser subproject was the only way to create fresh version of the JNLP app with logging dependencies shaded in and proper jar signature
- if the required profiles are only in parent pom, the following command will report no active profiles
mvn -Dmaven.test.skip=true -PuseLog4j,makeShadedJar,makeSignedJar help:active-profiles
mvn -Dmaven.test.skip=true -PuseLog4j,makeShadedJar,makeSignedJar clean install
+ bug: api: include Princeton WordNet license in data jar
- bug: gui: update prompt copy "Type a word to look it up in WordNet..." should Search button be mentioned? (current copy: "Type a word to lookup in WordNet...")
+ bug: gui: after pom refactor, window title is "Yawni Parent Browser" :)
+ bug: gui: after pom refactor, About dialog application title is "Yawni Parent Browser"
- feature: gui: ⊚ add URL to About dialog; make sure it is mouse select+copyable or clickable or both
+ feature: gui: add icon to About dialog
- feature: promotion: ⊚ contact known users of other Java WordNet clients with targeted announcement
* email Revelytix and ask them what they use
* email Cognition and ask them what they use
? email JWNL and ask them to hang it up ? :)
- create twitter account: yawni is currently a spam Twitter account; reported as spam
- Freshmeat
- Ohloh: I Use This
- SourceForge reviews
- announce on personal twitter account
- announce on wordnet mailing list
- Linas Vepstas: OpenCog / ReLex
- Sean Adams
- Kirk Roberts
- Bryan Rink
- Rion Snow
- The Stanford WordNet Project
- https://ai.stanford.edu/~rion/swn/index.html
- opennlp
- mailing lists
- annotation.org
- GATE mailing list
- UIMA mailing list
? Lucene mailing list ?
+ feature: promotion: point yawni.org at SourceForge site
~ bug: promotion: ⊚ update SourceForge site (add download, no longer beta, Maven info, ...)
+ feature: promotion: add PmWiki-based content to SourceForge site: basic documentation, basic examples, some color
+ feature: promotion: make Wiki-based homepage the default landing page for https://yawni.sourceforge.net (like https://jwordnet.sourceforge.net)
- feature: promotion: ⊚ ask Oliver Steele to update project members (i.e., remove Kurt Hayes)
- feature: promotion: ⊚ have Princeton update the JWordNet link on https://wordnet.princeton.edu/wordnet/related-projects/
- feature: promotion: ⊚ Amazon Affiliate sponsored links
? any Semantic Web books apropos here ?
- WordNet: An Electronic Lexical Database (book) https://www.amazon.com/WordNet-Electronic-Database-Language-Communication/dp/026206197X/ref=sr_1_1?ie=UTF8&s=books&qid=1261949881&sr=8-1
- Word Sense Disambiguation: Algorithms and Applications (book) https://www.amazon.com/Word-Sense-Disambiguation-Algorithms-Applications/dp/1402068700/ref=sr_1_1?ie=UTF8&s=books&qid=1261949949&sr=1-1
+ bug: use consistent URL in documentation, etc.
https://sourceforge.net/projects/yawni/ -- SourceForge standard page
https://yawni.sourceforge.net/ -- looks spammy with ad in the center
~ feature: promotion: gui: add Web Start app link to main page
- make sure it works on Linux from Firefox ; see what others do
- feature: api: ⊚ publish to public Maven repositories
+ bug: api: report more useful exception than NPE when data cannot be found (i.e., not in classpath, $WNSEARCHDIR, or $WNHOME); currently, exceptions look like: java.lang.IllegalArgumentException: no stream for noun.exc
- bug: api: ⊚ $WNSEARCHDIR/$WNHOME environment collison(s) would be confusing to debug; document this
- feature: api: determine WordNet version at runtime; compute signatures for known versions; key file lengths; key file MD5 sums
+ bug: api: if $WNSEARCHDIR and/or $WNHOME defined, overrides the data jar; apparent when they point it at a real WordNet 2.x dict/ dir (causes test failures)
* notes from FileManager.getFileStream(String filename, boolean filenameIsWnRelative)
if YAWNI_USE_JAR, try the jar
sys prop to guarantee using the jar to prevent weird application-level WN data version mismatches
- sysprops and environment variables create security issues
- XXX should this be the default ?
else if WNSEARCHDIR (or WNHOME) are defined, use them
- benefits: mmap'd FileChannel requires less memory and inits faster
- allows simple WN data version changes
else try the jar
- zero environment dependencies
What behavior should we use if SecurityException is thrown?
- this would invariably mean reading local environment variables and arbitrary files from disk was also prohibited so jar is only solution
How can we test behavior in a sandboxed environment ?
If we read from jar, do we need user to trust our application at all?
- may not even need signing in this case - data also delivered as a
(8.7MB) jar so not even network reads required)
How can we test behavior in the sandboxed, high security environment?
+ bug: api: gui: default build data/ and use it for unit tests in core/ and browser/ to sidestep environment issues and work in IDEs
- bug: api: gui: ⊚ add basic Getting Started documentation including slf4j requirements; copy / reference Apache Mina docs on same: https://mina.apache.org/logging-configuration.html
- bug: api: decrease unit test Xmx -- 256m is excessive; note that base cache sizes form a minimum required memory (esp. if they are filled)
- 4 active: DEFAULT_CACHE_CAPACITY=10000 (morphyCache, synsetCache, indexWordCache, exceptionsCache)
- these can easily get fairly large and capture
- feature: api: consider public API addition: List<String> getExceptions(CharSequence someString, POS pos)
+ bug: api: include Princeton WordNet license in data subproject !
< feature: gui: consider left-aligning button/searchField pod; Muller OS X WordNet does this
- feature: gui: for RelationType dropdowns, consider implementing UI guideline of keeping same set of menu items, but only enabling apropos ones - helps user learn; Muller OS X WordNet does this
- don't want meta-relations (e.g., MEMBER_MERONYM)
- only want those RelationTypes applicable to the particular POS
+ bug: api: for Noun "Roman", what yawni calls "Member Meronyms" , wnb calls (Member) Holonyms ("MEMBER OF"); currently have "mero"="#" and "holo"="%" ; doh! wn.h says "mero"="%" and holo="#"
- feature: gui: typeahead find in Relation PopdownButton menu could support substring "smart" search (e.g., "holo..." in "Member Holonyms"; maybe like QuickSilver's algorithm?)
+ bug: gui: fix UI copy for "The noun kitten has 1 sense (first 1 from tagged texts)"
< feature: gui: promotion: splash screen would be nice during download and boot; great place to showcase new icon!
+ bug: api: Word.getRelationTypes() sometimes returns an incorrect values for lexical relations (e.g., INSTANCE_*, DERIVATIONALLY_RELATED)
hypocrite; thinks it has DERIVATIONALLY_RELATED, but this doesn't only applies to other WordSenses in its Synset;
? something's not working right:
palatine (Roman official)
Roman
+ bug: gui: ADJ Derivationally related forms shows 4 entries, should only be 2; not respecting WordSense-level of the relationship
+ bug: api: gui: "hypocrite" shows derivationally related forms menu option, but nothing shows up (turns out this is correct!)
+ feature: gui: use SwingWorker in preload (ditches explicit Threads, but requires Java 6 (sort of))
- feature: gui: use SwingWorker in ConcurrentSearchListModel (ditches explicit Threads, but requires Java 6 (sort of))
- feature: gui: use simple <span> and some CSS (rules) to modify rendering in resultEditorPane; ideally, controllable from properties file
+ feature: gui: add smart quotes to text of statusLabel prompt to reflect its showing the entered text, e.g., "Overview of roman" → "Overview of “roman”"
+ feature: gui: use dominant case for relation summary copy: "Applies to 4 of the 4 senses of roman" → "Applies to all 4 senses of Roman" (ADJ "roman" → "Derivationally related forms")
+ feature: gui: tweak relation summary copy when numApplicableSenses == senses.size()
- 3 cases
Applies to 1 of the 1 senses of George W. Bush
Applies the only sense of George W. Bush
Applies to 2 of the 2 senses of George Bush
Applies to both senses of George Bush
Applies to 4 of the 4 senses of Roman
Applies to all 4 senses of Roman
* similar for 3+
< feature: api: add alternation support to GetIndex for {' ' → '/', '-' → '/'} (e.g., "read write memory" → "read/write memory")
< feature: api: add information on Unicode characters (e.g., '→' U+2192 RIGHTWARDS ARROW; see gloss of "tilde" and https://rishida.net/scripts/uniview/ for ideas)
- would these be gloss? synonyms ?
- would be cool, for example, to have:
- suggested usage of smart quotes (“”)
" as alternation for “
" as alternation for ”
- "therefore" as alternation for ∴ (maybe even ∵ as antonymn); would either of these work to help describe/render WordNet relation entailment ?
- "Esc" as alternation for ␛
- "Backspace" as alternation for ⌫
- "Enter" as alternation for ↩
- "Tab" as alternation for ⇥
x bug: gui: SearchFrame substring search fails to find any matches with '/' (e.g., "read/write memory"); " " (space) works;
+ this works; should add gui test
< feature: api: gui: use custom XML format internally and render it to HTML in a custom EditorKit; XML format can be used in multiple places
< feature: gui: add Safari / Preview-like highlighting-current panel search (⌘ +F); Safari dims screen, temporarily adds search nav bar; impl similar to wavy red underline ?
+ bug: api: upgrade poms to build (including unit tests!) in any environment and test this (no jarsign resource, no data jar, WNHOME, WNSEARCHDIR set, etc.)
- if yawni-data is not available, fall back to WNHOME,
if WNHOME is not available, report useful error
+ bug: api: Morphy's morphprep() has some useful looking logic short circuite; easiest fix is to check some examples against C version's output
- feature: api: test: check some values against C api for correctness; need to make either a wnb harness or C code; both pain to impl
- feature: api: false-positive tests: generate random collocations and prepositional phrases and see if they 1) produce stems 2) are in WordNet
< feature: gui: use web-start friendly preferences; this is hard
~ feature: gui: use TextPrompt to display "No matches found."; non-stanard because searchField will not be empty for this case
< feature: gui: use custom ListCellRenderer to implement properly-cased display in SearchFrame
- bug: gui: needs_test: fix rendering of key bindings on menus (not Control+N, ^+N)
+ bug: gui: Undo and Redo menu icons are too big! also, crop the wasted edge space out of the source .pngs
- bug: gui: on Windows, alt-F4 should close windows, not Ctrl+W
+ bug: gui: register UncaughtExceptionHandler to avoid silent exception swallowing; show a dialog
+ bug: gui: search for "performant", result pane shows single bullet (status shows "No matches found"); caused by weird leak/bug/issue in JEditorPane + DefaultStyledDocument
+ bug: gui: leak in JEditorPane / DefaultStyledDocument; solution add a clear() method
~ feature: informal poll of Mihai (Snow Leopard) and Richard to see what version of OS X they have (Tiger, Leopard, Snow Leopard?)
- feature: gui: should clearing searchField automatically clear previous result set ?
- at least disable controls ? (like POS dropdowns)
- feature: gui: "placard" (not BottomBar) which shows adjustable (font) zoom level, etc.
< feature: gui: View menu / dropdown somewhere for advanced options; is this too advanced to make so available to regular user, e.g.,
Show/Hide Glosses
Show/Hide Example Sentences
Show/Hide Synset Offsets (aka Database Locations)
Show/Hide Frequency Counts
Show/Hide Lexical File Info
Show/Hide Lexical File Numbers
Show/Hide Sense Keys
Show/Hide Sense Numbers
Show/Hide Core WordNet Rank
Show/Hide WordSense Antonyms
Show/Hide WordSense AdjPosition
+ feature: gui: in search fields (or corresponding results area), use TextPrompt to show basic directions (like OS X Dictionary used to?); https://tips4java.wordpress.com/2009/11/29/text-prompt/
* verify with author that Apache License is OK ?
- text "Type a word to lookup in WordNet…" requires more width in searchField than we have
- doesn't work in combination with OS X features (but does look kinda cool, if a little textually busy)
searchField.putClientProperty("JTextField.variant", "search");
searchField.putClientProperty("JTextField.Search.CancelAction"...
- we already have a similar prompt in our "status bar"
- maybe searchField focus should control resultEditorPane content ?
- italics, medium alpha
- may be better to put prompt 1/4 or 1/5 from top in results area; OS X Dictionary says
Type a word to look up in… (horizontally centered, gray-ish/alpha, normal, serif font, ≈2x normal size (plenty of room))
New Oxford American Dictionary (horizontally centered, gray-ish/alpha, italics, serif font, ≈2x normal size (plenty of room))
- feature: gui: rename ActionHelper to StyledActionHelper or something; factor all similar code into it (or it into similar code), i.e., StyledTextPane
- feature: api: refactoring: consider optimizing CharSequences.parseInt/parseLong (à la https://nadeausoftware.com/articles/2009/08/java_tip_how_parse_integers_quickly)
< bug: gui: SearchFrame dialog should render with apple.awt.brushMetalLook on OS X (Apple bug)
- feature: gui: bundle as native OS X app; https://mojo.codehaus.org/osxappbundle-maven-plugin/
+ feature: gui: add prefix-typing navigation of PopdownButton content (like JComboBox)
- DefaultKeySelectionManager
- bug: gui: OS X: app bar says "com.sun.javaws.Main" instead of "Yawni WordNet Browser"; clearly missing -Xdock:name="Yawni WordNet Browser"
- OS X Java Web Start bug: JnlpxArgs.getArgumentList: Internal Error: remaining custArgsMaxLen: -1110 < vmArgsPropertyStr.length: 72 dropping vmArgsPropertyStr
- interesting alternative syntax:
<param name="java_arguments" value="-Djnlp.packEnabled=true">
? what JNLP version added this ?
+ bug: gui: OS X: if app window focus changes while a PopdownButton JPopupMenu is showing, it stays above ALL other application windows ! known bug; only affects heavyweight popups which OS X uses exclusively
- https://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4812585
- bug: gui: OS X: when selecting POS button, focus is not explicitly on "Senses"
< feature: gui: OS X using reflection, ala OSXAdapter, override handleQuit(), handleAbout()
+ bug: gui: OS X search field (x) no longer works (searchField.putClientProperty("JTextField.variant", "search");) ; fix searchField.putClientProperty("JTextField.Search.CancelAction", Action...
< feature: gui: require zero Java Web Start permissions (no preferences saving)
- feature: gui: count and display how many Word or WordSense hits for "inherited" searches
- bug: promotion: ⊚ add nice high res project icon (design one :)
- what are typical icon sizes ?
* 512×512 pixel image (for Finder icons in Mac OS X v10.5 and later)
* 128×128 pixel image (for Finder icons in all versions of Mac OS X)
* 32×32 pixel image (hint for Finder icons)
* 16×16 pixel image (hint for Finder icons)
* 48x48 pixel image (used by SourceForge)
* A mask that defines the image’s edges so that the operating system can determine which regions are clickable
- icns file for OS X ?
- scoured Interfacelift, found Bombia Design and asked for a quote
To whom it may concern:
I am the primary developer of the open source project "Yawni" (Yet Another WordNet API) (https://sourceforge.net/projects/yawni/), a graphical and programmatic interface to WordNet (https://wordnet.princeton.edu/). I have been working hard on both the "back end" and UI of the project and even have a concept for an icon. My concept is a yawning cat (I like this one: https://www.joua.net/paddington/yawn.jpg), possibly vectorized à la vectormagic.com. I found your portfolios via Interfacelift and was curious what it would cost me to have you create a scalable icon for the project. Obviously I don't have much to spend on this free and open source project, but I think a memorable icon is an important way to promote a project.
Kind regards,
- Luke
Dallas, Texas, USA
+ experimented with some variants of the Paddington image on vectormagic.com
+ emailed webmaster hosting Paddington image
+ bug: api: doc: note that stems are true case in DictionaryDatabase#getLemma()
+ feature: gui: support multiple independent, concurrent Browser frames (OS X style "Multi Document Interface(MDI))
+ bug: api: can't find "'s Gravenhage" in gui ; BloomFilter hashcode considered case and space != underscore; added Hasher interface
- feature: api: doc: file:///Users/nezda/President%20of%20the%20United%20States.svg
~ feature: api: in Word, consider caching senses since it is Iterable on it; getSense(n) should be cheaper; consider reversing this "optimization"/
+ feature: api: get tests to run from within Netbeans (possibly with Maven test scope dep on data subproject?)
- bug: api: ⊚ Morphy + Synset-searching torture test (! including POS.ALL AND collocations !)
+ feature: ditch inefficient LookaheadIterator for Google Collections AbstractIterator
< feature: gui: inherited meronyms not shown in GUI drop down; these are indirect via hypernyms (e.g., "capacity"#3's hypernym "volume" has direct meronyms)
+ feature: api: add BloomFilter anti-match optimization to lookupWord(String lemma, POS)
+ feature: api: add BloomFilter anti-match optimization to other methods (e.g., getExceptions()); trickier for Morphy/lookupBaseForms()
+ bug: api: fix BloomFilter issue with non-String CharSequence hashCode (feature)
+ feature: api: make Synset.getGloss() lazy/optional value - read & parse on demand; saves tons of RAM!
- bug: api: ⊚ do not depend on default platform character encoding; mainly effects Unicode input like "résumé"; reading WordNet data files should be ASCII, reading UI should be UTF-8
+ bug: gui: factor into new browser subproject
< feature: start "new browser" remote branch -- JSplitPane-based, maybe LGPL dependencies
~ bug: gui: needs_test: CTRL+W doesn't close (hide) Substring Search window on Linux; weird: have to listen for (0x17, etb: end of transmission block)
~ feature: api: tolerate not having data.<POS> files (i.e., only index.<POS> files) to allow use as a resource-light, data driven stemmer
- this may "just work"
- bug: gui: substring search keyboard focus skips Substring/Prefix radio button group
+ bug: gui: main AND substring search frames open, moving one frame to another desktop should move the other ∴ JDialog
Old English:
- this behavior is called a dialog ("modal" means the dialog must be dismissed before the main screen is available for use again)
- Apple fail: JDialog doesn't respect apple.awt.brushMetalLook !