-
Notifications
You must be signed in to change notification settings - Fork 745
Optimizer Cookbook
All the wasm-opt
flags are documented in --help
, of course, but maybe it's not obvious which of the various flags are more important and worth trying first. This page has suggestions for that.
There is also a special page for Wasm GC, which has many more considerations.
--low-memory-unused
is a flag (not a pass) which means "addresses below 1024 are unused", which avoids load/store ptr overflows in situations that prevent certain optimizations. Specifically it can fold added constants into load/store offsets, which are smaller and more efficient.
You need to tell wasm-ld not to use address 1024 or below for globals (emcc does it automatically) for that to be safe.
--gufa
is a new pass that is not on by default yet, so you need to run it manually. It infers constant values in a whole-program manner. It mostly helps wasm GC by inferring exact types and such, but it can also infer function results on wasm MVP that then lead to more benefits.
--flatten --rereloop -Oz -Oz
Flattening the IR is necessary for running "re-reloop" which completely rewrites the control flow graph. That's sometimes slow and so it's not on by default. But sometimes it helps by a few %.
-Oz
twice is useful after it, as flattening the IR requires additional work to clean up. One way to think about this is that wasm-opt
's default pipeline has been tuned on optimized LLVM output, and so if you give it something less optimized it might take more than one cycle of the pipeline. And the IR after flattening is less optimized than LLVM's optimized form, so it takes more work.
-tnh
is a flag that means "assume traps never happen" which lets the optimizer remove code on paths leading to traps (since it is allowed to assume they never happen in practice when the program runs).
That can interferes with things like crash reporting, if you save info right before crashing. And it will remove runtime asserts in the form of "if error, trap". But if you can live without those it can help. For example, if we assume traps never happen then we can move a load into an if arm and sometimes not run it (but if it trapped, we couldn't change the observable behavior of the trap).
--converge
runs all the opts you told it to in a loop while the file keeps shrinking. That is, --converge -Oz
will keep running all the passes in -Oz
until we reach a fixed point.
Usually the benefit of such additional cycles is limited, but sometimes it matters quite a lot, especially in larger programs.
Most passes look at a single function at a time, and when they see any call of another function they assume it can have arbitrary effects. Computing global effects lets the optimizer do better, by computing each function's effects and then using that. For example, if a function just returns an integer then it does not have any effects, and the optimizer will be able to move a local.set
past such a call.
To do this, use something like the following:
wasm-opt --generate-global-effects -O3
--generate-global-effects
computes the effects, which will then be used in later passes. This is not automatically recomputed, so you can add more invocations of it as needed.
Note that the logic assumes that optimization passes only decrease effects. That is what makes it correct to not automatically recompute effects during and after each pass. As normal optimization passes keep the behavior of the code identical, that means that no effects are added (but perhaps some might be removed, e.g. if they were in code we realize is dead). The only danger here is if you run a custom pass that adds effects, like the logging instrumentation passes.
--skip-pass=foo
will skip the pass foo
in the optimizer's normal pipeline. For example,
wasm-opt -O3 --skip-pass=coalesce-locals
will skip coalesce-locals
, which normally is run at least once in -O3
.
In general this should not be needed as the normal optimization pipeline does the right thing. But in some cases it can be useful to skip specific passes, for example, imagine that you want to optimize but not coalesce locals, perhaps because you will run some analysis or custom operation on them later, then skipping coalesce-locals
as just shown might help.