ActionScript 3 RegExp
uses syntax similar to ECMAScript 3 with some additions.
RegExp
compatibility in Ruffle and other emulators is tracked in issue #14651.
The ActionScript 3 Developer's Guide
states that ActionScript 3.0 implements regular expressions as defined in the
ECMAScript 3 specification, but additionally adds named capture groups.
It also documents the x
flag,
but does not note that it is not a part of ECMAScript 3.
The draft AS3 Language Specification
does not describe RegExp
semantics. Is there a better specification?
I've reviewed each of the similar projects listed in Ruffle's Helpful Resources
for how they handle RegExp
. The open source Flash petition
also has a good list.
Ruffle uses regress
as its
RegExp
engine in regexp.rs
and RegExp.as.
regress
implements
the RegExp
syntax of ECMAScript 2018, which has more features than ECMAScript
3 and is missing the AS3 additions.
I've surveyed the Ruffle issues and PRs for anything with “regex” or “regexp”. All issues are from differences between ECMAScript 3 and AS3 syntaxes:
(?P<
>)
named captures exist in AS3 (#13278, #10395, #10511). ECMAScript 3 has no named captures (see 15.10.1). ECMAScript 2018 has(?<
>)
named captures (see 21.2.1).x
extended flag exists in AS3 (#13965), but not ECMAScript 3 (see 15.10.4.1) or ECMAScript 2018 (see 12.2.8.1).
Differences between ECMAScript 3 and 2018 should not break already working regexps, but would allow newer features.
avmplus (also known as Tamarin)
implements RegExp
in:
It uses a modified version of PCRE 7.3 and includes a copy of PCRE 10.20, but does not use it.
The initial commit of avmplus records that it is the source code for the ActionScript VM in the Adobe Flash Player for December 2013.
More background on Tamarin is on Wikipedia, formerly on MDN, and in response to a petition to open-source Adobe Flash Player.
Lightspark uses PCRE from avmplus in RegExp.h and RegExp.cpp.
Which version of PCRE is this and why was it selected?
Mozilla Shumway parses and sanitizes AS3 patterns and flags to equivalent
JavaScript semantics in RegExp.as and ASRegExp
. The current design was
written
just before ECMAScript 2015 was published, so it probably targets ECMAScript 5.1
RegExp
semantics. The prior design delegated in ASRegExp
to the XRegExp
library, which converts its own extended syntax to JavaScript syntax. The
current design indicates, that it fixes more tests than XRegExp, which implies
that XRegExp syntax was not a design inspiration for the AS3 language authors.
The approach in Shumway (and AwayFL) reminds me of how Scala.js compiles Java regex patterns to semantically equivalent JavaScript patterns.
Shumway has a few RegExp
-specific tests.
AwayFL does the same thing as Shumway in ASRegExp
.
Their initial version
is copied verbatim from Shumway and changes have been made since. Before
@awayfl/avm2 was extracted
as a separate package, it existed as a subdirectory of @awayfl/swf-viewer,
where the Git history continues.
GNU Gnash, swf2js,
swfdec, and seemingly OpenFL
do not implement RegExp
. WAFlash is
closed-source, so I did not investigate it.
If sticking with an existing library, keeping regress
would be better than
switching to regex
, because regex
deliberately does not support
backreferences—a far more significant feature than any of those delineated
above—and its syntax is derived from RE2 rather than ECMAScript, so has larger
differences.
I've done significant work on regular expression engines and have been bitten
before by differing syntaxes between languages, so that's a problem I'm
interested in tackling in a general-purpose way, and Ruffle could benefit from
that effort. Now that regex-automata
exposes its HIR, other crates could
handle parsing and generate HIR. If a backtracking engine were added to
regex-automata
, it could fallback to it when backreferences are used, while
still having the extremely fast
performance of regex
when not using backtracking. If there is interest in
Ruffle, that could be my motivation to pursue this.
I think a port of the Shumway approach to Rust, with the fixes from AwayFL as appropriate, would be easiest for Ruffle. Shumway's algorithm was written for ECMAScript 5.1, so the only differences should be only those introduced by regress implementing ECMAScript 2018. It would allow some modern regular expression features, that AS3 never had, but would better work around the other differences. It would be strictly better than Ruffle's current situation, but not absolutely perfect.
Replacing regress
with fancy_regex
,
with the proper parsing changes, would be the most powerful approach.
fancy_regex
is a hybrid engine that delegates to regex
when possible and
falls back to backtracking when necessary.