Skip to content

Commit

Permalink
Update to Unicode 15.0
Browse files Browse the repository at this point in the history
  • Loading branch information
data-man authored and katef committed Oct 2, 2022
1 parent b594948 commit 6038ae4
Show file tree
Hide file tree
Showing 35 changed files with 553 additions and 114 deletions.
10 changes: 5 additions & 5 deletions share/ucd/CaseFolding.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# CaseFolding-14.0.0.txt
# Date: 2021-03-08, 19:35:41 GMT
# © 2021 Unicode®, Inc.
# CaseFolding-15.0.0.txt
# Date: 2022-02-02, 23:35:35 GMT
# © 2022 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For terms of use, see https://www.unicode.org/terms_of_use.html
#
# Unicode Character Database
# For documentation, see http://www.unicode.org/reports/tr44/
# For documentation, see https://www.unicode.org/reports/tr44/
#
# Case Folding Properties
#
Expand Down
2 changes: 1 addition & 1 deletion share/ucd/Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
UCD_URL ?= https://www.unicode.org/Public/14.0.0/ucd/
UCD_URL ?= https://www.unicode.org/Public/15.0.0/ucd/

WGET ?= wget

Expand Down
106 changes: 73 additions & 33 deletions share/ucd/Scripts.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Scripts-14.0.0.txt
# Date: 2021-07-10, 00:35:31 GMT
# © 2021 Unicode®, Inc.
# Scripts-15.0.0.txt
# Date: 2022-04-26, 23:15:02 GMT
# © 2022 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For terms of use, see https://www.unicode.org/terms_of_use.html
#
# Unicode Character Database
# For documentation, see http://www.unicode.org/reports/tr44/
# For documentation, see https://www.unicode.org/reports/tr44/
# For more information, see:
# UAX #24, Unicode Script Property: https://www.unicode.org/reports/tr24/
# Especially the sections:
Expand Down Expand Up @@ -532,6 +532,7 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR
1D183..1D184 ; Common # So [2] MUSICAL SYMBOL ARPEGGIATO UP..MUSICAL SYMBOL ARPEGGIATO DOWN
1D18C..1D1A9 ; Common # So [30] MUSICAL SYMBOL RINFORZANDO..MUSICAL SYMBOL DEGREE SLASH
1D1AE..1D1EA ; Common # So [61] MUSICAL SYMBOL PEDAL MARK..MUSICAL SYMBOL KORON
1D2C0..1D2D3 ; Common # No [20] KAKTOVIK NUMERAL ZERO..KAKTOVIK NUMERAL NINETEEN
1D2E0..1D2F3 ; Common # No [20] MAYAN NUMERAL ZERO..MAYAN NUMERAL NINETEEN
1D300..1D356 ; Common # So [87] MONOGRAM FOR EARTH..TETRAGRAM FOR FOSTERING
1D360..1D378 ; Common # No [25] COUNTING ROD UNIT DIGIT ONE..TALLY MARK FIVE
Expand Down Expand Up @@ -601,10 +602,10 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR
1F300..1F3FA ; Common # So [251] CYCLONE..AMPHORA
1F3FB..1F3FF ; Common # Sk [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6
1F400..1F6D7 ; Common # So [728] RAT..ELEVATOR
1F6DD..1F6EC ; Common # So [16] PLAYGROUND SLIDE..AIRPLANE ARRIVING
1F6DC..1F6EC ; Common # So [17] WIRELESS..AIRPLANE ARRIVING
1F6F0..1F6FC ; Common # So [13] SATELLITE..ROLLER SKATE
1F700..1F773 ; Common # So [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE
1F780..1F7D8 ; Common # So [89] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..NEGATIVE CIRCLED SQUARE
1F700..1F776 ; Common # So [119] ALCHEMICAL SYMBOL FOR QUINTESSENCE..LUNAR ECLIPSE
1F77B..1F7D9 ; Common # So [95] HAUMEA..NINE POINTED WHITE STAR
1F7E0..1F7EB ; Common # So [12] LARGE ORANGE CIRCLE..LARGE BROWN SQUARE
1F7F0 ; Common # So HEAVY EQUALS SIGN
1F800..1F80B ; Common # So [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD
Expand All @@ -615,22 +616,20 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR
1F8B0..1F8B1 ; Common # So [2] ARROW POINTING UPWARDS THEN NORTH WEST..ARROW POINTING RIGHTWARDS THEN CURVING SOUTH WEST
1F900..1FA53 ; Common # So [340] CIRCLED CROSS FORMEE WITH FOUR DOTS..BLACK CHESS KNIGHT-BISHOP
1FA60..1FA6D ; Common # So [14] XIANGQI RED GENERAL..XIANGQI BLACK SOLDIER
1FA70..1FA74 ; Common # So [5] BALLET SHOES..THONG SANDAL
1FA78..1FA7C ; Common # So [5] DROP OF BLOOD..CRUTCH
1FA80..1FA86 ; Common # So [7] YO-YO..NESTING DOLLS
1FA90..1FAAC ; Common # So [29] RINGED PLANET..HAMSA
1FAB0..1FABA ; Common # So [11] FLY..NEST WITH EGGS
1FAC0..1FAC5 ; Common # So [6] ANATOMICAL HEART..PERSON WITH CROWN
1FAD0..1FAD9 ; Common # So [10] BLUEBERRIES..JAR
1FAE0..1FAE7 ; Common # So [8] MELTING FACE..BUBBLES
1FAF0..1FAF6 ; Common # So [7] HAND WITH INDEX FINGER AND THUMB CROSSED..HEART HANDS
1FA70..1FA7C ; Common # So [13] BALLET SHOES..CRUTCH
1FA80..1FA88 ; Common # So [9] YO-YO..FLUTE
1FA90..1FABD ; Common # So [46] RINGED PLANET..WING
1FABF..1FAC5 ; Common # So [7] GOOSE..PERSON WITH CROWN
1FACE..1FADB ; Common # So [14] MOOSE..PEA POD
1FAE0..1FAE8 ; Common # So [9] MELTING FACE..SHAKING FACE
1FAF0..1FAF8 ; Common # So [9] HAND WITH INDEX FINGER AND THUMB CROSSED..RIGHTWARDS PUSHING HAND
1FB00..1FB92 ; Common # So [147] BLOCK SEXTANT-1..UPPER HALF INVERSE MEDIUM SHADE AND LOWER HALF BLOCK
1FB94..1FBCA ; Common # So [55] LEFT HALF INVERSE MEDIUM SHADE AND RIGHT HALF BLOCK..WHITE UP-POINTING CHEVRON
1FBF0..1FBF9 ; Common # Nd [10] SEGMENTED DIGIT ZERO..SEGMENTED DIGIT NINE
E0001 ; Common # Cf LANGUAGE TAG
E0020..E007F ; Common # Cf [96] TAG SPACE..CANCEL TAG

# Total code points: 8252
# Total code points: 8301

# ================================================

Expand Down Expand Up @@ -697,8 +696,9 @@ FF41..FF5A ; Latin # L& [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN
1DF00..1DF09 ; Latin # L& [10] LATIN SMALL LETTER FENG DIGRAPH WITH TRILL..LATIN SMALL LETTER T WITH HOOK AND RETROFLEX HOOK
1DF0A ; Latin # Lo LATIN LETTER RETROFLEX CLICK WITH RETROFLEX HOOK
1DF0B..1DF1E ; Latin # L& [20] LATIN SMALL LETTER ESH WITH DOUBLE BAR..LATIN SMALL LETTER S WITH CURL
1DF25..1DF2A ; Latin # L& [6] LATIN SMALL LETTER D WITH MID-HEIGHT LEFT HOOK..LATIN SMALL LETTER T WITH MID-HEIGHT LEFT HOOK

# Total code points: 1475
# Total code points: 1481

# ================================================

Expand Down Expand Up @@ -784,8 +784,10 @@ A680..A69B ; Cyrillic # L& [28] CYRILLIC CAPITAL LETTER DWE..CYRILLIC SMALL
A69C..A69D ; Cyrillic # Lm [2] MODIFIER LETTER CYRILLIC HARD SIGN..MODIFIER LETTER CYRILLIC SOFT SIGN
A69E..A69F ; Cyrillic # Mn [2] COMBINING CYRILLIC LETTER EF..COMBINING CYRILLIC LETTER IOTIFIED E
FE2E..FE2F ; Cyrillic # Mn [2] COMBINING CYRILLIC TITLO LEFT HALF..COMBINING CYRILLIC TITLO RIGHT HALF
1E030..1E06D ; Cyrillic # Lm [62] MODIFIER LETTER CYRILLIC SMALL A..MODIFIER LETTER CYRILLIC SMALL STRAIGHT U WITH STROKE
1E08F ; Cyrillic # Mn COMBINING CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I

# Total code points: 443
# Total code points: 506

# ================================================

Expand Down Expand Up @@ -883,6 +885,7 @@ FDFD..FDFF ; Arabic # So [3] ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM.
FE70..FE74 ; Arabic # Lo [5] ARABIC FATHATAN ISOLATED FORM..ARABIC KASRATAN ISOLATED FORM
FE76..FEFC ; Arabic # Lo [135] ARABIC FATHA ISOLATED FORM..ARABIC LIGATURE LAM WITH ALEF FINAL FORM
10E60..10E7E ; Arabic # No [31] RUMI DIGIT ONE..RUMI FRACTION TWO THIRDS
10EFD..10EFF ; Arabic # Mn [3] ARABIC SMALL LOW WORD SAKTA..ARABIC SMALL LOW WORD MADDA
1EE00..1EE03 ; Arabic # Lo [4] ARABIC MATHEMATICAL ALEF..ARABIC MATHEMATICAL DAL
1EE05..1EE1F ; Arabic # Lo [27] ARABIC MATHEMATICAL WAW..ARABIC MATHEMATICAL DOTLESS QAF
1EE21..1EE22 ; Arabic # Lo [2] ARABIC MATHEMATICAL INITIAL BEH..ARABIC MATHEMATICAL INITIAL JEEM
Expand Down Expand Up @@ -918,7 +921,7 @@ FE76..FEFC ; Arabic # Lo [135] ARABIC FATHA ISOLATED FORM..ARABIC LIGATURE LA
1EEAB..1EEBB ; Arabic # Lo [17] ARABIC MATHEMATICAL DOUBLE-STRUCK LAM..ARABIC MATHEMATICAL DOUBLE-STRUCK GHAIN
1EEF0..1EEF1 ; Arabic # Sm [2] ARABIC MATHEMATICAL OPERATOR MEEM WITH HAH WITH TATWEEL..ARABIC MATHEMATICAL OPERATOR HAH WITH DAL

# Total code points: 1365
# Total code points: 1368

# ================================================

Expand Down Expand Up @@ -970,8 +973,9 @@ A8FB ; Devanagari # Lo DEVANAGARI HEADSTROKE
A8FC ; Devanagari # Po DEVANAGARI SIGN SIDDHAM
A8FD..A8FE ; Devanagari # Lo [2] DEVANAGARI JAIN OM..DEVANAGARI LETTER AY
A8FF ; Devanagari # Mn DEVANAGARI VOWEL SIGN AY
11B00..11B09 ; Devanagari # Po [10] DEVANAGARI HEAD MARK..DEVANAGARI SIGN MINDU

# Total code points: 154
# Total code points: 164

# ================================================

Expand Down Expand Up @@ -1182,8 +1186,9 @@ A8FF ; Devanagari # Mn DEVANAGARI VOWEL SIGN AY
0CE2..0CE3 ; Kannada # Mn [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL
0CE6..0CEF ; Kannada # Nd [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE
0CF1..0CF2 ; Kannada # Lo [2] KANNADA SIGN JIHVAMULIYA..KANNADA SIGN UPADHMANIYA
0CF3 ; Kannada # Mc KANNADA SIGN COMBINING ANUSVARA ABOVE RIGHT

# Total code points: 90
# Total code points: 91

# ================================================

Expand Down Expand Up @@ -1263,11 +1268,11 @@ A8FF ; Devanagari # Mn DEVANAGARI VOWEL SIGN AY
0EBD ; Lao # Lo LAO SEMIVOWEL SIGN NYO
0EC0..0EC4 ; Lao # Lo [5] LAO VOWEL SIGN E..LAO VOWEL SIGN AI
0EC6 ; Lao # Lm LAO KO LA
0EC8..0ECD ; Lao # Mn [6] LAO TONE MAI EK..LAO NIGGAHITA
0EC8..0ECE ; Lao # Mn [7] LAO TONE MAI EK..LAO YAMAKKAN
0ED0..0ED9 ; Lao # Nd [10] LAO DIGIT ZERO..LAO DIGIT NINE
0EDC..0EDF ; Lao # Lo [4] LAO HO NO..LAO LETTER KHMU NYO

# Total code points: 82
# Total code points: 83

# ================================================

Expand Down Expand Up @@ -1532,10 +1537,11 @@ AB70..ABBF ; Cherokee # L& [80] CHEROKEE SMALL LETTER A..CHEROKEE SMALL LETT
309D..309E ; Hiragana # Lm [2] HIRAGANA ITERATION MARK..HIRAGANA VOICED ITERATION MARK
309F ; Hiragana # Lo HIRAGANA DIGRAPH YORI
1B001..1B11F ; Hiragana # Lo [287] HIRAGANA LETTER ARCHAIC YE..HIRAGANA LETTER ARCHAIC WU
1B132 ; Hiragana # Lo HIRAGANA LETTER SMALL KO
1B150..1B152 ; Hiragana # Lo [3] HIRAGANA LETTER SMALL WI..HIRAGANA LETTER SMALL WO
1F200 ; Hiragana # So SQUARE HIRAGANA HOKA

# Total code points: 380
# Total code points: 381

# ================================================

Expand All @@ -1552,9 +1558,10 @@ FF71..FF9D ; Katakana # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAK
1AFFD..1AFFE ; Katakana # Lm [2] KATAKANA LETTER MINNAN NASALIZED TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-8
1B000 ; Katakana # Lo KATAKANA LETTER ARCHAIC E
1B120..1B122 ; Katakana # Lo [3] KATAKANA LETTER ARCHAIC YI..KATAKANA LETTER ARCHAIC WU
1B155 ; Katakana # Lo KATAKANA LETTER SMALL KO
1B164..1B167 ; Katakana # Lo [4] KATAKANA LETTER SMALL WI..KATAKANA LETTER SMALL N

# Total code points: 320
# Total code points: 321

# ================================================

Expand Down Expand Up @@ -1582,14 +1589,15 @@ FA70..FAD9 ; Han # Lo [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILI
16FE3 ; Han # Lm OLD CHINESE ITERATION MARK
16FF0..16FF1 ; Han # Mc [2] VIETNAMESE ALTERNATE READING MARK CA..VIETNAMESE ALTERNATE READING MARK NHAY
20000..2A6DF ; Han # Lo [42720] CJK UNIFIED IDEOGRAPH-20000..CJK UNIFIED IDEOGRAPH-2A6DF
2A700..2B738 ; Han # Lo [4153] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B738
2A700..2B739 ; Han # Lo [4154] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B739
2B740..2B81D ; Han # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
2B820..2CEA1 ; Han # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
2CEB0..2EBE0 ; Han # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
2F800..2FA1D ; Han # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
30000..3134A ; Han # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
31350..323AF ; Han # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF

# Total code points: 94215
# Total code points: 98408

# ================================================

Expand Down Expand Up @@ -2093,10 +2101,13 @@ AADE..AADF ; Tai_Viet # Po [2] TAI VIET SYMBOL HO HOI..TAI VIET SYMBOL KOI

# ================================================

13000..1342E ; Egyptian_Hieroglyphs # Lo [1071] EGYPTIAN HIEROGLYPH A001..EGYPTIAN HIEROGLYPH AA032
13430..13438 ; Egyptian_Hieroglyphs # Cf [9] EGYPTIAN HIEROGLYPH VERTICAL JOINER..EGYPTIAN HIEROGLYPH END SEGMENT
13000..1342F ; Egyptian_Hieroglyphs # Lo [1072] EGYPTIAN HIEROGLYPH A001..EGYPTIAN HIEROGLYPH V011D
13430..1343F ; Egyptian_Hieroglyphs # Cf [16] EGYPTIAN HIEROGLYPH VERTICAL JOINER..EGYPTIAN HIEROGLYPH END WALLED ENCLOSURE
13440 ; Egyptian_Hieroglyphs # Mn EGYPTIAN HIEROGLYPH MIRROR HORIZONTALLY
13441..13446 ; Egyptian_Hieroglyphs # Lo [6] EGYPTIAN HIEROGLYPH FULL BLANK..EGYPTIAN HIEROGLYPH WIDE LOST SIGN
13447..13455 ; Egyptian_Hieroglyphs # Mn [15] EGYPTIAN HIEROGLYPH MODIFIER DAMAGED AT TOP START..EGYPTIAN HIEROGLYPH MODIFIER DAMAGED

# Total code points: 1080
# Total code points: 1110

# ================================================

Expand Down Expand Up @@ -2440,8 +2451,10 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
11236..11237 ; Khojki # Mn [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA
11238..1123D ; Khojki # Po [6] KHOJKI DANDA..KHOJKI ABBREVIATION SIGN
1123E ; Khojki # Mn KHOJKI SIGN SUKUN
1123F..11240 ; Khojki # Lo [2] KHOJKI LETTER QA..KHOJKI LETTER SHORT I
11241 ; Khojki # Mn KHOJKI VOWEL SIGN VOCALIC R

# Total code points: 62
# Total code points: 65

# ================================================

Expand Down Expand Up @@ -2988,4 +3001,31 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI

# Total code points: 70

# ================================================

11F00..11F01 ; Kawi # Mn [2] KAWI SIGN CANDRABINDU..KAWI SIGN ANUSVARA
11F02 ; Kawi # Lo KAWI SIGN REPHA
11F03 ; Kawi # Mc KAWI SIGN VISARGA
11F04..11F10 ; Kawi # Lo [13] KAWI LETTER A..KAWI LETTER O
11F12..11F33 ; Kawi # Lo [34] KAWI LETTER KA..KAWI LETTER JNYA
11F34..11F35 ; Kawi # Mc [2] KAWI VOWEL SIGN AA..KAWI VOWEL SIGN ALTERNATE AA
11F36..11F3A ; Kawi # Mn [5] KAWI VOWEL SIGN I..KAWI VOWEL SIGN VOCALIC R
11F3E..11F3F ; Kawi # Mc [2] KAWI VOWEL SIGN E..KAWI VOWEL SIGN AI
11F40 ; Kawi # Mn KAWI VOWEL SIGN EU
11F41 ; Kawi # Mc KAWI SIGN KILLER
11F42 ; Kawi # Mn KAWI CONJOINER
11F43..11F4F ; Kawi # Po [13] KAWI DANDA..KAWI PUNCTUATION CLOSING SPIRAL
11F50..11F59 ; Kawi # Nd [10] KAWI DIGIT ZERO..KAWI DIGIT NINE

# Total code points: 86

# ================================================

1E4D0..1E4EA ; Nag_Mundari # Lo [27] NAG MUNDARI LETTER O..NAG MUNDARI LETTER ELL
1E4EB ; Nag_Mundari # Lm NAG MUNDARI SIGN OJOD
1E4EC..1E4EF ; Nag_Mundari # Mn [4] NAG MUNDARI SIGN MUHOR..NAG MUNDARI SIGN SUTUH
1E4F0..1E4F9 ; Nag_Mundari # Nd [10] NAG MUNDARI DIGIT ZERO..NAG MUNDARI DIGIT NINE

# Total code points: 42

# EOF
Loading

0 comments on commit 6038ae4

Please sign in to comment.