-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuzzing: ClusterFuzz integration #7079
base: main
Are you sure you want to change the base?
Conversation
@@ -85,7 +85,7 @@ | |||
# Delete the argument, as importing |shared| scans it. | |||
sys.argv.pop() | |||
|
|||
from test import shared | |||
from test import shared # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we refactor the shared argument parsing to use less global state so we don't have to dodge the linter like this?
|
||
./emsdk install tot | ||
|
||
after which ./upstream/ (from the emsdk dir) will contain portable builds of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "portable" mean in this context?
|
||
2. Run the unit tests, which include smoke tests for our ClusterFuzz support: | ||
|
||
python -m unittest test/unit/test_cluster_fuzz.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this script should run these smoke tests automatically?
input_data_file_path = os.path.join(output_dir, '%d.input' % i) | ||
wasm_file_path = os.path.join(output_dir, '%d.wasm' % i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input_data_file_path = os.path.join(output_dir, '%d.input' % i) | |
wasm_file_path = os.path.join(output_dir, '%d.wasm' % i) | |
input_data_file_path = os.path.join(output_dir, f'{i}.input') | |
wasm_file_path = os.path.join(output_dir, f'{i}.wasm') |
# wasm-opt may fail to run in rare cases (when the fuzzer emits code it | ||
# detects as invalid). Just try again in such a case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I've never seen this before. Should we put the retry logic in wasm-opt instead?
for i in range(0, N + 2): | ||
fuzz_file = os.path.join(temp_dir.name, f'fuzz-binaryen-{i}.js') | ||
flags_file = os.path.join(temp_dir.name, f'flags-binaryen-{i}.js') | ||
# We actually emit the range [1, N], so 0 or N+1 should not exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this a bit simpler and just emit files starting with 0?
# | ||
# StructNew : 18 | ||
# | ||
struct_news_regex = re.compile(r'StructNew(\s+):(\s+)(\d+)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the parens for readability? They shouldn't be necessary IIUC.
shared.WASM_OPT + ['-all', '--metrics', binary_file, '-q'], text=True) | ||
|
||
# Update with what we see. | ||
struct_news = re.findall(struct_news_regex, metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why findall
? Don't we expect there to be only one StructNew
line?
# Update with what we see. | ||
struct_news = re.findall(struct_news_regex, metrics) | ||
if not struct_news: | ||
# No line is emitted when --metrics seens no struct.news. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# No line is emitted when --metrics seens no struct.news. | |
# No line is emitted when --metrics sees no struct.news. |
if not struct_news: | ||
# No line is emitted when --metrics seens no struct.news. | ||
struct_news = [('', '', '0')] | ||
seen_struct_news.append(int(struct_news[0][2])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can used a named group instead of a plain index to make this more robust. (I'm surprised 2 is the right group index. Shouldn't it be 3 because group 0 is the entire match?)
The main addition here is a
bundle_clusterfuzz.py
script which will package upthe exact files that should be uploaded to ClusterFuzz. It also documents the
process and bundling and testing. You can do
That bundles
wasm-opt
from./bin.
, which is enough for local testing. Foractually uploading to ClusterFuzz, we need a portable build, and @dschuff
had the idea to reuse the emsdk build, which works nicely. Doing
will bundle
wasm-opt
(+libs) from the emsdk. I verified that those buildswork on ClusterFuzz.
I added several forms of testing here. First, our main fuzzer
fuzz_opt.py
nowhas a
ClusterFuzz
testcase handler, which simulates a ClusterFuzz environment.Second, there are smoke tests that run in the unit test suite, and can also be
run separately:
Those unit tests can also run on a given bundle, e.g. one created from an
emsdk build, for testing right before upload:
A third piece of testing is to add a
--fuzz-passes
test. That is a mode for-ttf
(translate random data into a valid wasm fuzz testcase) that uses randomdata to pick and run a set of passes, to further shape the wasm. (
--fuzz-passes
had no previous testing, and this PR fixes it and tidies it up a little, adding some
newer passes too).
Otherwise this PR includes the key
run.py
script that is bundled and thenexecuted by ClusterFuzz, basically a python script that runs
wasm-opt -ttf [..]
to generate testcases, sets up their JS, and emits them.
fuzz_shell.js
, which is the JS to execute testcases, will now check if it isprovided binary data of a wasm file. If so, it does not read a wasm file from
argv[1]
. (This is needed because ClusterFuzz expects a single file for thetestcase, so we make a JS file with bundled wasm inside it.)