Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzing: ClusterFuzz integration #7079

Open
wants to merge 78 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
9926504
start
kripken Nov 8, 2024
8d201ca
work
kripken Nov 11, 2024
fa633e9
work
kripken Nov 12, 2024
d29bb70
prep
kripken Nov 12, 2024
17e6e94
work
kripken Nov 12, 2024
bc9a1d1
work
kripken Nov 12, 2024
eb91fd3
work
kripken Nov 12, 2024
e1d5be0
work
kripken Nov 12, 2024
c9b057c
work
kripken Nov 12, 2024
fe8b47a
work
kripken Nov 12, 2024
cc22c7a
work
kripken Nov 12, 2024
1b97501
work
kripken Nov 12, 2024
6fb3e45
work
kripken Nov 12, 2024
ae2f663
work
kripken Nov 12, 2024
b940d34
work
kripken Nov 12, 2024
794980c
work
kripken Nov 12, 2024
823f146
work
kripken Nov 12, 2024
1ed21d5
work
kripken Nov 12, 2024
1657555
work
kripken Nov 12, 2024
586bad8
work
kripken Nov 12, 2024
ad6f5ee
work
kripken Nov 12, 2024
66e56db
work
kripken Nov 12, 2024
02a89b7
work
kripken Nov 12, 2024
156f6b6
fix
kripken Nov 12, 2024
07e1033
text
kripken Nov 13, 2024
f0cab01
oops
kripken Nov 13, 2024
a694dd7
restore
kripken Nov 13, 2024
af7b2d5
finish
kripken Nov 13, 2024
a0da68b
moar
kripken Nov 13, 2024
faf380c
oops.in.advance
kripken Nov 13, 2024
c9546a2
fix
kripken Nov 13, 2024
1d69074
prep
kripken Nov 13, 2024
a1e8257
test
kripken Nov 13, 2024
7769825
test
kripken Nov 13, 2024
69ce873
test
kripken Nov 13, 2024
b107a8b
test
kripken Nov 13, 2024
12b6324
test
kripken Nov 13, 2024
855d882
test
kripken Nov 13, 2024
e90bfbc
test
kripken Nov 13, 2024
aa4134b
test
kripken Nov 13, 2024
d93c615
dynamic
kripken Nov 13, 2024
1519588
dynamic
kripken Nov 13, 2024
076aa57
dynamic
kripken Nov 13, 2024
7852327
dynamic
kripken Nov 13, 2024
3d183d4
dynamic
kripken Nov 13, 2024
10ee7c4
work
kripken Nov 13, 2024
41c3e32
work
kripken Nov 13, 2024
23d0006
work
kripken Nov 13, 2024
a3f1b39
work
kripken Nov 14, 2024
fb6e8a8
work
kripken Nov 14, 2024
b6c0543
work
kripken Nov 14, 2024
0f998a8
test
kripken Nov 14, 2024
c30122c
fixes
kripken Nov 14, 2024
693f56c
fix
kripken Nov 14, 2024
838983a
fix
kripken Nov 14, 2024
a9c5a2e
test
kripken Nov 14, 2024
5525b36
work
kripken Nov 14, 2024
c423d35
fix
kripken Nov 14, 2024
e24ee9c
more
kripken Nov 14, 2024
8568cf8
test
kripken Nov 14, 2024
8fb0b69
fix
kripken Nov 14, 2024
5a87183
work
kripken Nov 14, 2024
d8aa63e
works
kripken Nov 14, 2024
46bca52
Merge remote-tracking branch 'origin/main' into clusterfuzz
kripken Nov 14, 2024
b440b65
notes
kripken Nov 14, 2024
53cec85
fix
kripken Nov 14, 2024
e0fb922
format
kripken Nov 14, 2024
23ae5a4
text
kripken Nov 14, 2024
d0b254d
note
kripken Nov 14, 2024
ccf4683
note
kripken Nov 14, 2024
6487be1
lint
kripken Nov 14, 2024
5fcf347
lint
kripken Nov 14, 2024
b3859df
lint
kripken Nov 14, 2024
2b3e0f7
lint
kripken Nov 14, 2024
e17046b
update
kripken Nov 14, 2024
51cff4d
try to fix macos
kripken Nov 15, 2024
9b08a40
Make the test use the right build dir, which varies on CI
kripken Nov 15, 2024
e3c9915
find build dir properly
kripken Nov 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 129 additions & 0 deletions scripts/bundle_clusterfuzz.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
#!/usr/bin/python3

'''
Bundle files for uploading to ClusterFuzz.

Usage:

bundle.py OUTPUT_FILE.tgz [--build-dir=BUILD_DIR]

The output file will be a .tgz file.

if a build directory is provided, we will look under there to find bin/wasm-opt
and lib/libbinaryen.so. A useful place to get builds from is the Emscripten SDK,
as you can do

./emsdk install tot

after which ./upstream/ (from the emsdk dir) will contain portable builds of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "portable" mean in this context?

wasm-opt and libbinaryen.so. Thus, the full workflow could be

cd emsdk
./emsdk install tot
cd ../binaryen
python3 scripts/bundle_clusterfuzz.py binaryen_wasm_fuzzer.tgz --build-dir=../emsdk/upstream

When using --build-dir in this way, you are responsible for ensuring that the
wasm-opt in the build dir is compatible with the scripts in the current dir
(e.g., if run.py here passes a flag that is only in a new/older version of
wasm-opt, a problem can happen).

Before uploading to ClusterFuzz, it is worth doing the following:

1. Run the local fuzzer (scripts/fuzz_opt.py). That includes a ClusterFuzz
testcase handler, which simulates what ClusterFuzz does.

2. Run the unit tests, which include smoke tests for our ClusterFuzz support:

python -m unittest test/unit/test_cluster_fuzz.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this script should run these smoke tests automatically?


Look at the logs, which will contain statistics on the wasm files the
fuzzer emits, and see that they look reasonable.

You should run the unit tests on the bundle you are about to upload, by
setting the proper env var like this (using the same filename as above):

BINARYEN_CLUSTER_FUZZ_BUNDLE=`pwd`/binaryen_wasm_fuzzer.tgz python -m unittest test/unit/test_cluster_fuzz.py

Note that you must pass an absolute filename (e.g. using pwd as shown).

The unittest logs should reflect that that bundle is being used at the
very start ("Using existing bundle: ..." rather than "Making a new
bundle"). Note that some of the unittests also create their own bundles, to
test the bundling script itself, so later down you will see logging of
bundle creation even if you provide a bundle.

After uploading to ClusterFuzz, you can wait a while for it to run, and then:

1. Inspect the log to see that we generate all the testcases properly, and
their sizes look reasonably random, etc.

2. Inspect the sample testcase and run it locally, to see that

d8 --wasm-staging testcase.js

properly runs the testcase, emitting logging etc.

3. Check the stats and crashes page (known crashes should at least be showing
up). Note that these may take longer to show up than 1 and 2.
'''

import os
import sys
import tarfile

# Read the filenames first, as importing |shared| changes the directory.
output_file = os.path.abspath(sys.argv[1])
print(f'Bundling to: {output_file}')
assert output_file.endswith('.tgz'), 'Can only generate a .tgz'

build_dir = None
if len(sys.argv) >= 3:
assert sys.argv[2].startswith('--build-dir=')
build_dir = sys.argv[2].split('=')[1]
build_dir = os.path.abspath(build_dir)
# Delete the argument, as importing |shared| scans it.
sys.argv.pop()

from test import shared # noqa
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refactor the shared argument parsing to use less global state so we don't have to dodge the linter like this?


# Pick where to get the builds
if build_dir:
binaryen_bin = os.path.join(build_dir, 'bin')
binaryen_lib = os.path.join(build_dir, 'lib')
else:
binaryen_bin = shared.options.binaryen_bin
binaryen_lib = shared.options.binaryen_lib

with tarfile.open(output_file, "w:gz") as tar:
# run.py
run = os.path.join(shared.options.binaryen_root, 'scripts', 'clusterfuzz', 'run.py')
print(f' .. run: {run}')
tar.add(run, arcname='run.py')

# fuzz_shell.js
fuzz_shell = os.path.join(shared.options.binaryen_root, 'scripts', 'fuzz_shell.js')
print(f' .. fuzz_shell: {fuzz_shell}')
tar.add(fuzz_shell, arcname='scripts/fuzz_shell.js')

# wasm-opt binary
wasm_opt = os.path.join(binaryen_bin, 'wasm-opt')
print(f' .. wasm-opt: {wasm_opt}')
tar.add(wasm_opt, arcname='bin/wasm-opt')

# For a dynamic build we also need libbinaryen.so and possibly other files.
# Try both .so and .dylib suffixes for more OS coverage.
for suffix in ['.so', '.dylib']:
libbinaryen = os.path.join(binaryen_lib, f'libbinaryen{suffix}')
if os.path.exists(libbinaryen):
print(f' .. libbinaryen: {libbinaryen}')
tar.add(libbinaryen, arcname=f'lib/libbinaryen{suffix}')

# The emsdk build also includes some more necessary files.
for name in [f'libc++{suffix}', f'libc++{suffix}.2', f'libc++{suffix}.2.0']:
path = os.path.join(binaryen_lib, name)
if os.path.exists(path):
print(f' ......... : {path}')
tar.add(path, arcname=f'lib/{name}')

print('Done.')
163 changes: 163 additions & 0 deletions scripts/clusterfuzz/run.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
#
# Copyright 2024 WebAssembly Community Group participants
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

'''
ClusterFuzz run.py script: when run by ClusterFuzz, it uses wasm-opt to generate
a fixed number of testcases. This is a "blackbox fuzzer", see

https://google.github.io/clusterfuzz/setting-up-fuzzing/blackbox-fuzzing/

This file should be bundled up together with the other files it needs, see
bundle_clusterfuzz.py.
'''

import os
import getopt
import random
import subprocess
import sys

# The V8 flags we put in the "fuzzer flags" files, which tell ClusterFuzz how to
# run V8. By default we apply all staging flags.
FUZZER_FLAGS_FILE_CONTENTS = '--wasm-staging'

# Maximum size of the random data that we feed into wasm-opt -ttf. This is
# smaller than fuzz_opt.py's INPUT_SIZE_MAX because that script is tuned for
# fuzzing large wasm files (to reduce the overhead we have of launching many
# processes per file), which is less of an issue on ClusterFuzz.
MAX_RANDOM_SIZE = 15 * 1024

# The prefix for fuzz files.
FUZZ_FILENAME_PREFIX = 'fuzz-'

# The prefix for flags files.
FLAGS_FILENAME_PREFIX = 'flags-'

# The name of the fuzzer (appears after FUZZ_FILENAME_PREFIX /
# FLAGS_FILENAME_PREFIX).
FUZZER_NAME_PREFIX = 'binaryen-'

# The root directory of the bundle this will be in, which is the directory of
# this very file.
ROOT_DIR = os.path.dirname(os.path.abspath(__file__))

# The path to the wasm-opt binary that we run to generate testcases.
FUZZER_BINARY_PATH = os.path.join(ROOT_DIR, 'bin', 'wasm-opt')

# The path to the fuzz_shell.js script that will execute the wasm in each
# testcase.
JS_SHELL_PATH = os.path.join(ROOT_DIR, 'scripts', 'fuzz_shell.js')

# The arguments we provide to wasm-opt to generate wasm files.
FUZZER_ARGS = [
# Generate a wasm from random data.
'--translate-to-fuzz',
# Run some random passes, to further shape the random wasm we emit.
'--fuzz-passes',
# Enable all features but disable ones not yet ready for fuzzing. This may
# be a smaller set than fuzz_opt.py, as that enables a few experimental
# flags, while here we just fuzz with d8's --wasm-staging.
'-all',
'--disable-shared-everything',
'--disable-fp16',
]


# Returns the file name for fuzz or flags files.
def get_file_name(prefix, index):
return f'{prefix}{FUZZER_NAME_PREFIX}{index}.js'


# Returns the contents of a .js fuzz file, given particular wasm contents that
# we want to be executed.
def get_js_file_contents(wasm_contents):
# Start with the standard JS shell.
with open(JS_SHELL_PATH) as file:
js = file.read()

# Prepend the wasm contents, so they are used (rather than the normal
# mechanism where the wasm file's name is provided in argv).
wasm_contents = ','.join([str(c) for c in wasm_contents])
js = f'var binary = new Uint8Array([{wasm_contents}]);\n\n' + js
return js


def main(argv):
# Parse the options. See
# https://google.github.io/clusterfuzz/setting-up-fuzzing/blackbox-fuzzing/#uploading-a-fuzzer
output_dir = '.'
num = 100
expected_flags = ['input_dir=', 'output_dir=', 'no_of_files=']
optlist, _ = getopt.getopt(argv[1:], '', expected_flags)
for option, value in optlist:
if option == '--output_dir':
output_dir = value
elif option == '--no_of_files':
num = int(value)

for i in range(1, num + 1):
input_data_file_path = os.path.join(output_dir, '%d.input' % i)
wasm_file_path = os.path.join(output_dir, '%d.wasm' % i)
Comment on lines +111 to +112
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
input_data_file_path = os.path.join(output_dir, '%d.input' % i)
wasm_file_path = os.path.join(output_dir, '%d.wasm' % i)
input_data_file_path = os.path.join(output_dir, f'{i}.input')
wasm_file_path = os.path.join(output_dir, f'{i}.wasm')


# wasm-opt may fail to run in rare cases (when the fuzzer emits code it
# detects as invalid). Just try again in such a case.
Comment on lines +114 to +115
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I've never seen this before. Should we put the retry logic in wasm-opt instead?

for attempt in range(0, 100):
# Generate random data.
random_size = random.SystemRandom().randint(1, MAX_RANDOM_SIZE)
with open(input_data_file_path, 'wb') as file:
file.write(os.urandom(random_size))
Comment on lines +119 to +120
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reading of the ClusterFuzz documentation was that ClusterFuzz supplies input files. Should we be using those instead of generating new input files?


# Generate wasm from the random data.
cmd = [FUZZER_BINARY_PATH] + FUZZER_ARGS
cmd += ['-o', wasm_file_path, input_data_file_path]
try:
subprocess.check_call(cmd)
except subprocess.CalledProcessError:
# Try again.
print('(oops, retrying wasm-opt)')
attempt += 1
if attempt == 99:
# Something is very wrong!
raise
continue
# Success, leave the loop.
break

# Generate a testcase from the wasm
with open(wasm_file_path, 'rb') as file:
wasm_contents = file.read()
testcase_file_path = os.path.join(output_dir,
get_file_name(FUZZ_FILENAME_PREFIX, i))
js_file_contents = get_js_file_contents(wasm_contents)
with open(testcase_file_path, 'w') as file:
file.write(js_file_contents)

# Emit a corresponding flags file.
flags_file_path = os.path.join(output_dir,
get_file_name(FLAGS_FILENAME_PREFIX, i))
with open(flags_file_path, 'w') as file:
file.write(FUZZER_FLAGS_FILE_CONTENTS)

print(f'Created testcase: {testcase_file_path}, {len(wasm_contents)} bytes')

# Remove temporary files.
os.remove(input_data_file_path)
os.remove(wasm_file_path)

print(f'Created {num} testcases.')


if __name__ == '__main__':
main(sys.argv)
82 changes: 81 additions & 1 deletion scripts/fuzz_opt.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
import random
import re
import sys
import tarfile
import time
import traceback
from os.path import abspath
Expand Down Expand Up @@ -1574,6 +1575,84 @@ def handle(self, wasm):
run([in_bin('wasm-opt'), abspath('a.wast')] + FEATURE_OPTS)


# Fuzz in a near-identical manner to how we fuzz on ClusterFuzz. This is mainly
# to see that fuzzing that way works properly (it likely won't catch anything
# the other fuzzers here catch, though it is possible). That is, running this
# script continuously will give continuous cover that ClusterFuzz should be
# running ok.
#
# Note that this is *not* deterministic like the other fuzzers: it runs run.py
# like ClusterFuzz does, and that generates its own random data. If a bug is
# caught here, it must be reduced manually.
class ClusterFuzz(TestCaseHandler):
frequency = 0.1

def handle(self, wasm):
self.ensure()

# run.py() should emit these two files. Delete them to make sure they
# are created by run.py() in the next step.
fuzz_file = 'fuzz-binaryen-1.js'
flags_file = 'flags-binaryen-1.js'
for f in [fuzz_file, flags_file]:
if os.path.exists(f):
os.unlink(f)

# Call run.py(), similarly to how ClusterFuzz does.
run([sys.executable,
os.path.join(self.clusterfuzz_dir, 'run.py'),
'--output_dir=' + os.getcwd(),
'--no_of_files=1'])

# We should see the two files.
assert os.path.exists(fuzz_file)
assert os.path.exists(flags_file)

# Run the testcase in V8, similarly to how ClusterFuzz does.
cmd = [shared.V8]
# The flags are given in the flags file - we do *not* use our normal
# flags here!
with open(flags_file, 'r') as f:
flags = f.read()
cmd.append(flags)
# Run the fuzz file, which contains a modified fuzz_shell.js - we do
# *not* run fuzz_shell.js normally.
cmd.append(os.path.abspath(fuzz_file))
# No wasm file needs to be provided: it is hardcoded into the JS. Note
# that we use run_vm(), which will ignore known issues in our output and
# in V8. Those issues may cause V8 to e.g. reject a binary we emit that
# is invalid, but that should not be a problem for ClusterFuzz (it isn't
# a crash).
output = run_vm(cmd)

# Verify that we called something. The fuzzer should always emit at
# least one exported function (unless we've decided to ignore the entire
# run).
if output != IGNORE:
assert FUZZ_EXEC_CALL_PREFIX in output

def ensure(self):
# The first time we actually run, set things up: make a bundle like the
# one ClusterFuzz receives, and unpack it for execution into a dir. The
# existence of that dir shows we've ensured all we need.
if hasattr(self, 'clusterfuzz_dir'):
return

self.clusterfuzz_dir = 'clusterfuzz'
if os.path.exists(self.clusterfuzz_dir):
shutil.rmtree(self.clusterfuzz_dir)
os.mkdir(self.clusterfuzz_dir)

print('Bundling for ClusterFuzz')
bundle = 'fuzz_opt_clusterfuzz_bundle.tgz'
run([in_binaryen('scripts', 'bundle_clusterfuzz.py'), bundle])

print('Unpacking for ClusterFuzz')
tar = tarfile.open(bundle, "r:gz")
tar.extractall(path=self.clusterfuzz_dir)
tar.close()


# The global list of all test case handlers
testcase_handlers = [
FuzzExec(),
Expand All @@ -1585,7 +1664,8 @@ def handle(self, wasm):
Merge(),
# TODO: enable when stable enough, and adjust |frequency| (see above)
# Split(),
RoundtripText()
RoundtripText(),
ClusterFuzz(),
]


Expand Down
Loading
Loading