Regex for the asian languages does not work, e.g. (?<!\p{Han}) or (?!\p{Lo}) #373

iG8R · 2024-04-15T09:35:13Z

Sometimes I need to replace Asian characters when they surrounded only with Western ones.
To do this, I'm trying to use Lookbehind and Lookahead constructions, e.g. (?<!\p{Han}) or (?!\p{Lo}), but they don't work in FoxReplace at all, although everything is fine when I check them, for example, on https://regex101.com.

The text was updated successfully, but these errors were encountered:

Woundorf · 2024-04-16T10:38:15Z

This is because I don't use any of the Unicode flags when creating the RegExp object, and they are needed to support these \p{...} character classes (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape). Maybe they should be used always or with an option (like the case sensitivity), but I didn't know about these flags until relatively recently.

Could you please create another issue asking for unicode support, with a link to this one as example?

In the meantime, as a workaround you could replace with a function where you test if the found text actually matches the correct regexp and then return the replaced string and otherwise the unmodified string.

iG8R · 2024-04-16T11:32:19Z

Thanks a lot for your attention and advice, I've already tried it and it is too cumbersome to use the function in this case.

iG8R · 2024-04-16T11:44:04Z

Maybe there is also a flag that make it possible to use the change capitalization escape in a substitution equation, like the following on https://stackoverflow.com/a/33351224/6773436:

Capitalize words

Find: (\s)([a-z]) (\s also matches new lines, i.e. "venuS" => "VenuS")
Replace: $1\u$2
Uncapitalize words

Find: (\s)([A-Z])
Replace: $1\l$2
Remove camel case (e.g. cAmelCAse => camelcAse => camelcase)

Find: ([a-z])([A-Z])
Replace: $1\l$2
Lowercase letters within words (e.g. LowerCASe => Lowercase)

Find: (\w)([A-Z]+)
Replace: $1\L$2
Alternate Replace: \L$0
Uppercase letters within words (e.g. upperCASe => uPPERCASE)

Find: (\w)([A-Z]+)
Replace: $1\U$2
Uppercase previous (e.g. upperCase => UPPERCase)

Find: (\w+)([A-Z])
Replace: \U$1$2
Lowercase previous (e.g. LOWERCase => lowerCase)

Find: (\w+)([A-Z])
Replace: \L$1$2
Uppercase the rest (e.g. upperCase => upperCASE)

Find: ([A-Z])(\w+)
Replace: $1\U$2
Lowercase the rest (e.g. lOWERCASE => lOwercase)

Find: ([A-Z])(\w+)
Replace: $1\L$2
Shift-right-uppercase (e.g. Case => cAse => caSe => casE)

Find: ([a-z\s])([A-Z])(\w)
Replace: $1\l$2\u$3
Shift-left-uppercase (e.g. CasE => CaSe => CAse => Case)

Find: (\w)([A-Z])([a-z\s])
Replace: \u$1\l$2$3

Woundorf · 2024-04-19T08:42:35Z

This is not possible in JavaScript without using a custom function. The only recognized special strings in JavaScript are listed here.

The examples listed in the Stack Overflow answer are for Sublime Text, which according to another comment relies on Boost, which following the links it seems that supports the same things as Perl.

iG8R · 2024-04-19T10:15:09Z

Thanks a lot for the clarification. It is so pity.

iG8R mentioned this issue Apr 16, 2024

Please, make support for the Unicode character class escape #374

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regex for the asian languages does not work, e.g. (?<!\p{Han}) or (?!\p{Lo}) #373

Regex for the asian languages does not work, e.g. (?<!\p{Han}) or (?!\p{Lo}) #373

iG8R commented Apr 15, 2024

Woundorf commented Apr 16, 2024

iG8R commented Apr 16, 2024

iG8R commented Apr 16, 2024

Woundorf commented Apr 19, 2024

iG8R commented Apr 19, 2024

Regex for the asian languages does not work, e.g. (?<!\p{Han}) or (?!\p{Lo}) #373

Regex for the asian languages does not work, e.g. (?<!\p{Han}) or (?!\p{Lo}) #373

Comments

iG8R commented Apr 15, 2024

Woundorf commented Apr 16, 2024

iG8R commented Apr 16, 2024

iG8R commented Apr 16, 2024

Woundorf commented Apr 19, 2024

iG8R commented Apr 19, 2024