Fuzzing: ClusterFuzz integration #7079

kripken · 2024-11-14T22:08:41Z

The main addition here is a bundle_clusterfuzz.py script which will package up
the exact files that should be uploaded to ClusterFuzz. It also documents the
process and bundling and testing. You can do

bundle.py OUTPUT_FILE.tgz

That bundles wasm-opt from ./bin., which is enough for local testing. For
actually uploading to ClusterFuzz, we need a portable build, and @dschuff
had the idea to reuse the emsdk build, which works nicely. Doing

bundle.py OUTPUT_FILE.tgz --build-dir=/path/to/emsdk/upstream/

will bundle wasm-opt (+libs) from the emsdk. I verified that those builds
work on ClusterFuzz.

I added several forms of testing here. First, our main fuzzer fuzz_opt.py now
has a ClusterFuzz testcase handler, which simulates a ClusterFuzz environment.
Second, there are smoke tests that run in the unit test suite, and can also be
run separately:

python -m unittest test/unit/test_cluster_fuzz.py

Those unit tests can also run on a given bundle, e.g. one created from an
emsdk build, for testing right before upload:

BINARYEN_CLUSTER_FUZZ_BUNDLE=/path/to/bundle.tgz python -m unittest test/unit/test_cluster_fuzz.py

A third piece of testing is to add a --fuzz-passes test. That is a mode for
-ttf (translate random data into a valid wasm fuzz testcase) that uses random
data to pick and run a set of passes, to further shape the wasm. (--fuzz-passes
had no previous testing, and this PR fixes it and tidies it up a little, adding some
newer passes too).

Otherwise this PR includes the key run.py script that is bundled and then
executed by ClusterFuzz, basically a python script that runs wasm-opt -ttf [..]
to generate testcases, sets up their JS, and emits them.

fuzz_shell.js, which is the JS to execute testcases, will now check if it is
provided binary data of a wasm file. If so, it does not read a wasm file from
argv[1]. (This is needed because ClusterFuzz expects a single file for the
testcase, so we make a JS file with bundled wasm inside it.)

tlively · 2024-11-14T23:03:32Z

scripts/bundle_clusterfuzz.py

@@ -85,7 +85,7 @@
    # Delete the argument, as importing |shared| scans it.
    sys.argv.pop()

-from test import shared
+from test import shared # noqa


Can we refactor the shared argument parsing to use less global state so we don't have to dodge the linter like this?

tlively · 2024-11-15T02:16:03Z

scripts/bundle_clusterfuzz.py

+
+  ./emsdk install tot
+
+after which ./upstream/ (from the emsdk dir) will contain portable builds of


What does "portable" mean in this context?

tlively · 2024-11-15T02:17:36Z

scripts/bundle_clusterfuzz.py

+
+  2. Run the unit tests, which include smoke tests for our ClusterFuzz support:
+
+       python -m unittest test/unit/test_cluster_fuzz.py


Maybe this script should run these smoke tests automatically?

tlively · 2024-11-15T02:33:22Z

scripts/clusterfuzz/run.py

+        input_data_file_path = os.path.join(output_dir, '%d.input' % i)
+        wasm_file_path = os.path.join(output_dir, '%d.wasm' % i)


Suggested change

input_data_file_path = os.path.join(output_dir, '%d.input' % i)

wasm_file_path = os.path.join(output_dir, '%d.wasm' % i)

input_data_file_path = os.path.join(output_dir, f'{i}.input')

wasm_file_path = os.path.join(output_dir, f'{i}.wasm')

tlively · 2024-11-15T02:34:41Z

scripts/clusterfuzz/run.py

+        # wasm-opt may fail to run in rare cases (when the fuzzer emits code it
+        # detects as invalid). Just try again in such a case.


Huh, I've never seen this before. Should we put the retry logic in wasm-opt instead?

tlively · 2024-11-15T02:49:23Z