Skip to content

Latest commit

 

History

History
149 lines (113 loc) · 8.73 KB

actionscript.md

File metadata and controls

149 lines (113 loc) · 8.73 KB

ActionScript 3 RegExp

ActionScript 3 RegExp uses syntax similar to ECMAScript 3 with some additions.

RegExp compatibility in Ruffle and other emulators is tracked in issue #14651.

AS3 specifications

The ActionScript 3 Developer's Guide states that ActionScript 3.0 implements regular expressions as defined in the ECMAScript 3 specification, but additionally adds named capture groups. It also documents the x flag, but does not note that it is not a part of ECMAScript 3.

The draft AS3 Language Specification does not describe RegExp semantics. Is there a better specification?

Implementations

I've reviewed each of the similar projects listed in Ruffle's Helpful Resources for how they handle RegExp. The open source Flash petition also has a good list.

Ruffle

Ruffle uses regress as its RegExp engine in regexp.rs and RegExp.as. regress implements the RegExp syntax of ECMAScript 2018, which has more features than ECMAScript 3 and is missing the AS3 additions.

I've surveyed the Ruffle issues and PRs for anything with “regex” or “regexp”. All issues are from differences between ECMAScript 3 and AS3 syntaxes:

  • (?P< >) named captures exist in AS3 (#13278, #10395, #10511). ECMAScript 3 has no named captures (see 15.10.1). ECMAScript 2018 has (?< >) named captures (see 21.2.1).
  • x extended flag exists in AS3 (#13965), but not ECMAScript 3 (see 15.10.4.1) or ECMAScript 2018 (see 12.2.8.1).

Differences between ECMAScript 3 and 2018 should not break already working regexps, but would allow newer features.

avmplus

avmplus (also known as Tamarin) implements RegExp in:

It uses a modified version of PCRE 7.3 and includes a copy of PCRE 10.20, but does not use it.

The initial commit of avmplus records that it is the source code for the ActionScript VM in the Adobe Flash Player for December 2013.

More background on Tamarin is on Wikipedia, formerly on MDN, and in response to a petition to open-source Adobe Flash Player.

Lightspark

Lightspark uses PCRE from avmplus in RegExp.h and RegExp.cpp.

Which version of PCRE is this and why was it selected?

Shumway

Mozilla Shumway parses and sanitizes AS3 patterns and flags to equivalent JavaScript semantics in RegExp.as and ASRegExp. The current design was written just before ECMAScript 2015 was published, so it probably targets ECMAScript 5.1 RegExp semantics. The prior design delegated in ASRegExp to the XRegExp library, which converts its own extended syntax to JavaScript syntax. The current design indicates, that it fixes more tests than XRegExp, which implies that XRegExp syntax was not a design inspiration for the AS3 language authors.

The approach in Shumway (and AwayFL) reminds me of how Scala.js compiles Java regex patterns to semantically equivalent JavaScript patterns.

Shumway has a few RegExp-specific tests.

AwayFL

AwayFL does the same thing as Shumway in ASRegExp. Their initial version is copied verbatim from Shumway and changes have been made since. Before @awayfl/avm2 was extracted as a separate package, it existed as a subdirectory of @awayfl/swf-viewer, where the Git history continues.

Others

GNU Gnash, swf2js, swfdec, and seemingly OpenFL do not implement RegExp. WAFlash is closed-source, so I did not investigate it.

Recommendation for Ruffle

If sticking with an existing library, keeping regress would be better than switching to regex, because regex deliberately does not support backreferences—a far more significant feature than any of those delineated above—and its syntax is derived from RE2 rather than ECMAScript, so has larger differences.

I've done significant work on regular expression engines and have been bitten before by differing syntaxes between languages, so that's a problem I'm interested in tackling in a general-purpose way, and Ruffle could benefit from that effort. Now that regex-automata exposes its HIR, other crates could handle parsing and generate HIR. If a backtracking engine were added to regex-automata, it could fallback to it when backreferences are used, while still having the extremely fast performance of regex when not using backtracking. If there is interest in Ruffle, that could be my motivation to pursue this.

I think a port of the Shumway approach to Rust, with the fixes from AwayFL as appropriate, would be easiest for Ruffle. Shumway's algorithm was written for ECMAScript 5.1, so the only differences should be only those introduced by regress implementing ECMAScript 2018. It would allow some modern regular expression features, that AS3 never had, but would better work around the other differences. It would be strictly better than Ruffle's current situation, but not absolutely perfect.

Replacing regress with fancy_regex, with the proper parsing changes, would be the most powerful approach. fancy_regex is a hybrid engine that delegates to regex when possible and falls back to backtracking when necessary.