-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuzzing: ClusterFuzz integration #7079
base: main
Are you sure you want to change the base?
Changes from all commits
9926504
8d201ca
fa633e9
d29bb70
17e6e94
bc9a1d1
eb91fd3
e1d5be0
c9b057c
fe8b47a
cc22c7a
1b97501
6fb3e45
ae2f663
b940d34
794980c
823f146
1ed21d5
1657555
586bad8
ad6f5ee
66e56db
02a89b7
156f6b6
07e1033
f0cab01
a694dd7
af7b2d5
a0da68b
faf380c
c9546a2
1d69074
a1e8257
7769825
69ce873
b107a8b
12b6324
855d882
e90bfbc
aa4134b
d93c615
1519588
076aa57
7852327
3d183d4
10ee7c4
41c3e32
23d0006
a3f1b39
fb6e8a8
b6c0543
0f998a8
c30122c
693f56c
838983a
a9c5a2e
5525b36
c423d35
e24ee9c
8568cf8
8fb0b69
5a87183
d8aa63e
46bca52
b440b65
53cec85
e0fb922
23ae5a4
d0b254d
ccf4683
6487be1
5fcf347
b3859df
2b3e0f7
e17046b
51cff4d
9b08a40
e3c9915
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
#!/usr/bin/python3 | ||
|
||
''' | ||
Bundle files for uploading to ClusterFuzz. | ||
|
||
Usage: | ||
|
||
bundle.py OUTPUT_FILE.tgz [--build-dir=BUILD_DIR] | ||
|
||
The output file will be a .tgz file. | ||
|
||
if a build directory is provided, we will look under there to find bin/wasm-opt | ||
and lib/libbinaryen.so. A useful place to get builds from is the Emscripten SDK, | ||
as you can do | ||
|
||
./emsdk install tot | ||
|
||
after which ./upstream/ (from the emsdk dir) will contain portable builds of | ||
wasm-opt and libbinaryen.so. Thus, the full workflow could be | ||
|
||
cd emsdk | ||
./emsdk install tot | ||
cd ../binaryen | ||
python3 scripts/bundle_clusterfuzz.py binaryen_wasm_fuzzer.tgz --build-dir=../emsdk/upstream | ||
|
||
When using --build-dir in this way, you are responsible for ensuring that the | ||
wasm-opt in the build dir is compatible with the scripts in the current dir | ||
(e.g., if run.py here passes a flag that is only in a new/older version of | ||
wasm-opt, a problem can happen). | ||
|
||
Before uploading to ClusterFuzz, it is worth doing the following: | ||
|
||
1. Run the local fuzzer (scripts/fuzz_opt.py). That includes a ClusterFuzz | ||
testcase handler, which simulates what ClusterFuzz does. | ||
|
||
2. Run the unit tests, which include smoke tests for our ClusterFuzz support: | ||
|
||
python -m unittest test/unit/test_cluster_fuzz.py | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe this script should run these smoke tests automatically? |
||
|
||
Look at the logs, which will contain statistics on the wasm files the | ||
fuzzer emits, and see that they look reasonable. | ||
|
||
You should run the unit tests on the bundle you are about to upload, by | ||
setting the proper env var like this (using the same filename as above): | ||
|
||
BINARYEN_CLUSTER_FUZZ_BUNDLE=`pwd`/binaryen_wasm_fuzzer.tgz python -m unittest test/unit/test_cluster_fuzz.py | ||
|
||
Note that you must pass an absolute filename (e.g. using pwd as shown). | ||
|
||
The unittest logs should reflect that that bundle is being used at the | ||
very start ("Using existing bundle: ..." rather than "Making a new | ||
bundle"). Note that some of the unittests also create their own bundles, to | ||
test the bundling script itself, so later down you will see logging of | ||
bundle creation even if you provide a bundle. | ||
|
||
After uploading to ClusterFuzz, you can wait a while for it to run, and then: | ||
|
||
1. Inspect the log to see that we generate all the testcases properly, and | ||
their sizes look reasonably random, etc. | ||
|
||
2. Inspect the sample testcase and run it locally, to see that | ||
|
||
d8 --wasm-staging testcase.js | ||
|
||
properly runs the testcase, emitting logging etc. | ||
|
||
3. Check the stats and crashes page (known crashes should at least be showing | ||
up). Note that these may take longer to show up than 1 and 2. | ||
''' | ||
|
||
import os | ||
import sys | ||
import tarfile | ||
|
||
# Read the filenames first, as importing |shared| changes the directory. | ||
output_file = os.path.abspath(sys.argv[1]) | ||
print(f'Bundling to: {output_file}') | ||
assert output_file.endswith('.tgz'), 'Can only generate a .tgz' | ||
|
||
build_dir = None | ||
if len(sys.argv) >= 3: | ||
assert sys.argv[2].startswith('--build-dir=') | ||
build_dir = sys.argv[2].split('=')[1] | ||
build_dir = os.path.abspath(build_dir) | ||
# Delete the argument, as importing |shared| scans it. | ||
sys.argv.pop() | ||
|
||
from test import shared # noqa | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we refactor the shared argument parsing to use less global state so we don't have to dodge the linter like this? |
||
|
||
# Pick where to get the builds | ||
if build_dir: | ||
binaryen_bin = os.path.join(build_dir, 'bin') | ||
binaryen_lib = os.path.join(build_dir, 'lib') | ||
else: | ||
binaryen_bin = shared.options.binaryen_bin | ||
binaryen_lib = shared.options.binaryen_lib | ||
|
||
with tarfile.open(output_file, "w:gz") as tar: | ||
# run.py | ||
run = os.path.join(shared.options.binaryen_root, 'scripts', 'clusterfuzz', 'run.py') | ||
print(f' .. run: {run}') | ||
tar.add(run, arcname='run.py') | ||
|
||
# fuzz_shell.js | ||
fuzz_shell = os.path.join(shared.options.binaryen_root, 'scripts', 'fuzz_shell.js') | ||
print(f' .. fuzz_shell: {fuzz_shell}') | ||
tar.add(fuzz_shell, arcname='scripts/fuzz_shell.js') | ||
|
||
# wasm-opt binary | ||
wasm_opt = os.path.join(binaryen_bin, 'wasm-opt') | ||
print(f' .. wasm-opt: {wasm_opt}') | ||
tar.add(wasm_opt, arcname='bin/wasm-opt') | ||
|
||
# For a dynamic build we also need libbinaryen.so and possibly other files. | ||
# Try both .so and .dylib suffixes for more OS coverage. | ||
for suffix in ['.so', '.dylib']: | ||
libbinaryen = os.path.join(binaryen_lib, f'libbinaryen{suffix}') | ||
if os.path.exists(libbinaryen): | ||
print(f' .. libbinaryen: {libbinaryen}') | ||
tar.add(libbinaryen, arcname=f'lib/libbinaryen{suffix}') | ||
|
||
# The emsdk build also includes some more necessary files. | ||
for name in [f'libc++{suffix}', f'libc++{suffix}.2', f'libc++{suffix}.2.0']: | ||
path = os.path.join(binaryen_lib, name) | ||
if os.path.exists(path): | ||
print(f' ......... : {path}') | ||
tar.add(path, arcname=f'lib/{name}') | ||
|
||
print('Done.') |
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,163 @@ | ||||||||||
# | ||||||||||
# Copyright 2024 WebAssembly Community Group participants | ||||||||||
# | ||||||||||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||||||||||
# you may not use this file except in compliance with the License. | ||||||||||
# You may obtain a copy of the License at | ||||||||||
# | ||||||||||
# http://www.apache.org/licenses/LICENSE-2.0 | ||||||||||
# | ||||||||||
# Unless required by applicable law or agreed to in writing, software | ||||||||||
# distributed under the License is distributed on an "AS IS" BASIS, | ||||||||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||||||
# See the License for the specific language governing permissions and | ||||||||||
# limitations under the License. | ||||||||||
|
||||||||||
''' | ||||||||||
ClusterFuzz run.py script: when run by ClusterFuzz, it uses wasm-opt to generate | ||||||||||
a fixed number of testcases. This is a "blackbox fuzzer", see | ||||||||||
|
||||||||||
https://google.github.io/clusterfuzz/setting-up-fuzzing/blackbox-fuzzing/ | ||||||||||
|
||||||||||
This file should be bundled up together with the other files it needs, see | ||||||||||
bundle_clusterfuzz.py. | ||||||||||
''' | ||||||||||
|
||||||||||
import os | ||||||||||
import getopt | ||||||||||
import random | ||||||||||
import subprocess | ||||||||||
import sys | ||||||||||
|
||||||||||
# The V8 flags we put in the "fuzzer flags" files, which tell ClusterFuzz how to | ||||||||||
# run V8. By default we apply all staging flags. | ||||||||||
FUZZER_FLAGS_FILE_CONTENTS = '--wasm-staging' | ||||||||||
|
||||||||||
# Maximum size of the random data that we feed into wasm-opt -ttf. This is | ||||||||||
# smaller than fuzz_opt.py's INPUT_SIZE_MAX because that script is tuned for | ||||||||||
# fuzzing large wasm files (to reduce the overhead we have of launching many | ||||||||||
# processes per file), which is less of an issue on ClusterFuzz. | ||||||||||
MAX_RANDOM_SIZE = 15 * 1024 | ||||||||||
|
||||||||||
# The prefix for fuzz files. | ||||||||||
FUZZ_FILENAME_PREFIX = 'fuzz-' | ||||||||||
|
||||||||||
# The prefix for flags files. | ||||||||||
FLAGS_FILENAME_PREFIX = 'flags-' | ||||||||||
|
||||||||||
# The name of the fuzzer (appears after FUZZ_FILENAME_PREFIX / | ||||||||||
# FLAGS_FILENAME_PREFIX). | ||||||||||
FUZZER_NAME_PREFIX = 'binaryen-' | ||||||||||
|
||||||||||
# The root directory of the bundle this will be in, which is the directory of | ||||||||||
# this very file. | ||||||||||
ROOT_DIR = os.path.dirname(os.path.abspath(__file__)) | ||||||||||
|
||||||||||
# The path to the wasm-opt binary that we run to generate testcases. | ||||||||||
FUZZER_BINARY_PATH = os.path.join(ROOT_DIR, 'bin', 'wasm-opt') | ||||||||||
|
||||||||||
# The path to the fuzz_shell.js script that will execute the wasm in each | ||||||||||
# testcase. | ||||||||||
JS_SHELL_PATH = os.path.join(ROOT_DIR, 'scripts', 'fuzz_shell.js') | ||||||||||
|
||||||||||
# The arguments we provide to wasm-opt to generate wasm files. | ||||||||||
FUZZER_ARGS = [ | ||||||||||
# Generate a wasm from random data. | ||||||||||
'--translate-to-fuzz', | ||||||||||
# Run some random passes, to further shape the random wasm we emit. | ||||||||||
'--fuzz-passes', | ||||||||||
# Enable all features but disable ones not yet ready for fuzzing. This may | ||||||||||
# be a smaller set than fuzz_opt.py, as that enables a few experimental | ||||||||||
# flags, while here we just fuzz with d8's --wasm-staging. | ||||||||||
'-all', | ||||||||||
'--disable-shared-everything', | ||||||||||
'--disable-fp16', | ||||||||||
] | ||||||||||
|
||||||||||
|
||||||||||
# Returns the file name for fuzz or flags files. | ||||||||||
def get_file_name(prefix, index): | ||||||||||
return f'{prefix}{FUZZER_NAME_PREFIX}{index}.js' | ||||||||||
|
||||||||||
|
||||||||||
# Returns the contents of a .js fuzz file, given particular wasm contents that | ||||||||||
# we want to be executed. | ||||||||||
def get_js_file_contents(wasm_contents): | ||||||||||
# Start with the standard JS shell. | ||||||||||
with open(JS_SHELL_PATH) as file: | ||||||||||
js = file.read() | ||||||||||
|
||||||||||
# Prepend the wasm contents, so they are used (rather than the normal | ||||||||||
# mechanism where the wasm file's name is provided in argv). | ||||||||||
wasm_contents = ','.join([str(c) for c in wasm_contents]) | ||||||||||
js = f'var binary = new Uint8Array([{wasm_contents}]);\n\n' + js | ||||||||||
return js | ||||||||||
|
||||||||||
|
||||||||||
def main(argv): | ||||||||||
# Parse the options. See | ||||||||||
# https://google.github.io/clusterfuzz/setting-up-fuzzing/blackbox-fuzzing/#uploading-a-fuzzer | ||||||||||
output_dir = '.' | ||||||||||
num = 100 | ||||||||||
expected_flags = ['input_dir=', 'output_dir=', 'no_of_files='] | ||||||||||
optlist, _ = getopt.getopt(argv[1:], '', expected_flags) | ||||||||||
for option, value in optlist: | ||||||||||
if option == '--output_dir': | ||||||||||
output_dir = value | ||||||||||
elif option == '--no_of_files': | ||||||||||
num = int(value) | ||||||||||
|
||||||||||
for i in range(1, num + 1): | ||||||||||
input_data_file_path = os.path.join(output_dir, '%d.input' % i) | ||||||||||
wasm_file_path = os.path.join(output_dir, '%d.wasm' % i) | ||||||||||
Comment on lines
+111
to
+112
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
# wasm-opt may fail to run in rare cases (when the fuzzer emits code it | ||||||||||
# detects as invalid). Just try again in such a case. | ||||||||||
Comment on lines
+114
to
+115
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Huh, I've never seen this before. Should we put the retry logic in wasm-opt instead? |
||||||||||
for attempt in range(0, 100): | ||||||||||
# Generate random data. | ||||||||||
random_size = random.SystemRandom().randint(1, MAX_RANDOM_SIZE) | ||||||||||
with open(input_data_file_path, 'wb') as file: | ||||||||||
file.write(os.urandom(random_size)) | ||||||||||
Comment on lines
+119
to
+120
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My reading of the ClusterFuzz documentation was that ClusterFuzz supplies input files. Should we be using those instead of generating new input files? |
||||||||||
|
||||||||||
# Generate wasm from the random data. | ||||||||||
cmd = [FUZZER_BINARY_PATH] + FUZZER_ARGS | ||||||||||
cmd += ['-o', wasm_file_path, input_data_file_path] | ||||||||||
try: | ||||||||||
subprocess.check_call(cmd) | ||||||||||
except subprocess.CalledProcessError: | ||||||||||
# Try again. | ||||||||||
print('(oops, retrying wasm-opt)') | ||||||||||
attempt += 1 | ||||||||||
if attempt == 99: | ||||||||||
# Something is very wrong! | ||||||||||
raise | ||||||||||
continue | ||||||||||
# Success, leave the loop. | ||||||||||
break | ||||||||||
|
||||||||||
# Generate a testcase from the wasm | ||||||||||
with open(wasm_file_path, 'rb') as file: | ||||||||||
wasm_contents = file.read() | ||||||||||
testcase_file_path = os.path.join(output_dir, | ||||||||||
get_file_name(FUZZ_FILENAME_PREFIX, i)) | ||||||||||
js_file_contents = get_js_file_contents(wasm_contents) | ||||||||||
with open(testcase_file_path, 'w') as file: | ||||||||||
file.write(js_file_contents) | ||||||||||
|
||||||||||
# Emit a corresponding flags file. | ||||||||||
flags_file_path = os.path.join(output_dir, | ||||||||||
get_file_name(FLAGS_FILENAME_PREFIX, i)) | ||||||||||
with open(flags_file_path, 'w') as file: | ||||||||||
file.write(FUZZER_FLAGS_FILE_CONTENTS) | ||||||||||
|
||||||||||
print(f'Created testcase: {testcase_file_path}, {len(wasm_contents)} bytes') | ||||||||||
|
||||||||||
# Remove temporary files. | ||||||||||
os.remove(input_data_file_path) | ||||||||||
os.remove(wasm_file_path) | ||||||||||
|
||||||||||
print(f'Created {num} testcases.') | ||||||||||
|
||||||||||
|
||||||||||
if __name__ == '__main__': | ||||||||||
main(sys.argv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "portable" mean in this context?