-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
85 lines (53 loc) · 2.41 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
▓ ▓▓▓ ▓ ▓
▓ ▓ ▓ ▓
▓ ▓▓▓ ▓
▓ ▓ ▓ ▓
▓▓▓ ▓▓▓ ▓ ▓ Linear expressions
First match pattern matching with a tiny bytecode instructed matching machine.
Spec http://jbee.github.io/lex/
Feedback http://github.com/jbee/lex/issues
Copyright (c) 2017 Jan Bernitt
________________________________________________________________________________
IMPLEMENTAIONS
Java: basic ~100 LOC, optimized ~150 LOC
________________________________________________________________________________
SETS {...}
{abc} a set of bytes "a","b" and "c"
{^abc} a set of any byte but "a","b" and "c"
{a-c} a set of "a", "b" and "c" given as a range
{?} a set of *all* non ASCII bytes
SPECIAL SETS
# = {0-9} any ASCII digit
@ = {a-zA-Z} any ASCII letter
$ any ASCII new line (\n or \r)
_ any ASCII whitespace character
^ any byte that is not an ASCII whitespace character
? any single byte
REPETITION x+
+ try previous set, group or literal again
GROUPS (...), [...], `...`
(abc) a group with sequence "abc" that *must* occur
[abc] a group with sequence "abc" that *can* occur
` exit group, unless first in group (used to embed)
SCANNING ~x
~ skip until following set, group or literal matches
ESCAPING
\ escape following byte to a literal (also in set)
Sets can also be used to match most of the instruction symbols literally.
For example {~} is similar to \~ or {\~}.
Any other byte (not {}()[]#@^_$+~?`\) is matched literally.
Escaping can be applied to any byte even if it is not needed.
________________________________________________________________________________
EXAMPLES
####/##/## a date of format yyyy/mm/dd
##:##[:##] a time of format hh:mm or hh:mm:ss
#+[.#+] a simple floating point number with optional decimals
"{^"}+" a quoted string using a set to find the end
"~" a quoted string using a scan to find the end
"~({^\\}") a quoted string using a scan with escaping support
\$@}a-zA-Z0-9_{+ a php style identifier
[\+]#+[{ -}#+]+ international phone numbers
~(Foo) searching for "Foo" (e.g. in a file)
~(<h#>) searching for "<h0>" to "<h9>"
~(<h{1-6}>) searching for "<h1>" to "<h6>"
~(Foo~(Bar)) searching for "Foo"s followed by "Bar"s