-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document how to write and run tests #373
base: main
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,127 @@ | |||
# Testing | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the sources for test should be in a separate directory, that way a test source can be in any format, include multi-file or even nested hierarchy or files. Specifically I see two advantages to this:
- The test binary directory (the one the test runner processes) only contains files with a fixed set of known suffixes.
- The test source directory structure can be language-specfic and take what ever form it likes. Perhaps some source languages like to have each program in its directory, or each program requires several files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably we'll gather all the tests in a webassembly/wasi-testsuite repository ala webassembly/testsuite that can be pulled in as submodule or subtree containing all the test fixtures you'll need to bring your own runner in an existing test setup.
Having a single self contained source file here makes it really easy to reference exactly what is failing; an implementation's test runner can even parse that programatically from a trace and output that.
That said; some of the webassembly proposal repositories do have a test/meta
directory which contains the files used to generate tests.
Generating parametric tests this way is definitively an option but I'd still like the generated sources to be available in the main test directory next to the binary for reference without having to "ship" the entire meta directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test binary directory (the one the test runner processes) only contains files with a fixed set of known suffixes.
I don't think we can promise this in general.
As proposals emerge the tests may have to be expanded upon with new meta files; additional permissions for sockets/networking comes to mind for example.
This initials set is just enough to support the filesystem and command proposals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One additional comment to this: though wasi-nn is rather exotic compared to, say, wasi-filesystem, I would like to consider what it would take to write WASI tests. With the current spec, the ML model and the input tensors would likely be shipped as separate files since they are rather large. Does that fit into this model?
Additionally, the output tensor(s) will likely not be exact matches but instead need some type of fuzzy assertion. Not sure how we will handle that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One additional comment to this: though wasi-nn is rather exotic compared to, say, wasi-filesystem, I would like to consider what it would take to write WASI tests. With the current spec, the ML model and the input tensors would likely be shipped as separate files since they are rather large. Does that fit into this model?
It's open ended and additive so it can be extended to encompass whatever we need but I'm not sure what wasi-nn needs, not that familiar with the proposal but seems it just loads models from memory?
If so then a data section seems more appropriate.
Additionally, the output tensor(s) will likely not be exact matches but instead need some type of fuzzy assertion. Not sure how we will handle that...
Depends on the previous question but probably just define some fuzzy asserters in the source language.
Internal logic should be handled internally IMO.
Really a case of you tell me tho; I'm not familiar enough with that proposal at this time to make any recommendations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on the previous question but probably just define some fuzzy asserters in the source language.
Internal logic should be handled internally IMO.
Sounds good.
I'm not sure what wasi-nn needs, not that familiar with the proposal but seems it just loads models from memory?
That's right but, practically speaking, those models/weights/tensors are large and are easier to manage as separate files than static data sections in a compiled Wasm file. It's not an exact corollary, but in an example over in Wasmtime I load this data from files before using them in wasi-nn. If we could bundle additional files with the tests and were allowed to use these files in <basename>.arg
(or even statically in the test, I guess), that would seem easiest. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason to embed any data the test needs to that it avoid the dependency on the filesystem API, making the test more precise and less fragile. It should be fairly straight forward to embed data in wasm, I think there are a few different ways you can do this if you are using an llvm-based toolchain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we could bundle additional files with the tests and were allowed to use these files in .arg (or even statically in the test, I guess), that would seem easiest. What do you think?
You could but this also introduces a lot of extra unrelated dependencies in your tests that can be avoided.
Different toolchains have different mechanisms for it but it can be quite trivial to embed the data. In Rust for example one could use the include_bytes macro.
let model_1 = include_bytes!("model_1.tensor")
Other toolchains have their ways too.
## Writing tests | ||
|
||
Any source language may be used to write tests as long as the compiler supports | ||
the wasm32-unknown-unknown target and is well-known and freely available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasm32-unknown-unknown
is an llvm-ism so we probably don't want to mention that here. Perhaps:
"Any source language may be used to write tests as long as the compiler supports
WebAssembly output targeting the WASI API"
The part about "well-known and freely available" will probably want to be tightened up too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasm32-unknown-unknown is an llvm-ism so we probably don't want to mention that here. Perhaps:
Targeting wasm32-wasi is probably fine actually, just don't pull in libc.
I just want to make sure we point out that using wasm32-wasi's main
is generally going to make for a bad test.
For example; trying to test proc_exit in main isn't great as it'll do all the bootstrapping that libc needs but isn't relevant in the context.
The part about "well-known and freely available" will probably want to be tightened up too.
Yeah I'll need to think about what that means.
I'm thinking:
- We can't allow proprietary compilers.
- We can't allow non-portable compilers.
- We can't allow toy languages that won't be or aren't maintained.
I'll double back to this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do tend to think we do want to allow tests to pull in libc. It does call proc_exit
and some of the stdio routines needlessly pull in fdstat_get
and perhaps other things, however bare WASI is really inconvenient to code to, so if we have to do everythin in bare WASI, it seems like we'd end with people writing fewer tests. It seems better to have more tests, even if some of them do call more functions than they strictly need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do tend to think we do want to allow tests to pull in libc. It does call proc_exit and some of the stdio routines needlessly pull in fdstat_get and perhaps other things, however bare WASI is really inconvenient to code to, so if we have to do everythin in bare WASI, it seems like we'd end with people writing fewer tests. It seems better to have more tests, even if some of them do call more functions than they strictly need to.
From experience it was a bit tedious to deal with as _main also calls into the prestat, args and env functions as-well (at-least at the time).
Simple to write tests against but early days of WASI in Deno basically couldn't be tested with my libc tests until everything was bootstrapped.
I think we'll just have to revisit this once the split is done and it's a bit clearer how that's going to work; but presumably we'd want to generate headers and such based on the the local witx files and not be dependent on the current state of say the wasi-libc implementation and headers or the wasi crate (which would also be a circular dependency).
As for wasi-libc itself we can test that in wasi-sdk with the same harness, it's already fairly similar we'd just have to collect them along with the tests from this repository into something like webassembly/wasi-testsuite to let implementers make use of them as-well.
docs/Testing.md
Outdated
- \<basename\>.stdin | ||
- \<basename\>.stdout | ||
- \<basename\>.stderr | ||
- \<basename\>.status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably say something about where the test live for a given proposal, and how a test running can/should find the tests for given proposal or all proposals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably say something about where the test live for a given proposal
Read through make-snapshot.sh and I'm thinking a proposal repository would put it's tests in test/<module_name>/<test_name>
where proposal is the root directory of the repository.
The make snapshot script merges proposals with $snapshot_dir/proposals/${repo}
so this allows us to use simple path filtering to match just about any selection of tests.
Run all the tests in a standalone repository:
git clone
https://github.com/WebAssembly/wasi-clocks
wasi-test-runner.sh wasmtime wasi-clocks/test
Run a specific proposals tests in a snapshot
git clone
https://github.com/WebAssembly/wasi
wasi-test-runner.sh wasmtime wasi/snapshot/ephemeral/proposals/<proposal>
Run a specific module's tests in standard
git clone
https://github.com/WebAssembly/wasi
wasi-test-runner.sh wasmtime wasi/standard/<module>
Simple but effective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just having the full test path be test/<module>.<function>[-\<variant\>].<ext>
would also work if we desire a flatter structure.
docs/Testing.md
Outdated
|
||
In addition to the source and binary files, a test can have any of the following | ||
auxilary files and directories: | ||
- \<basename\>.arg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO; add an example for each file type
One open ended question is how do we handle printing diagnostics to the testing environment. Assertions which just call unreachable aren't that great but it It would be preferable to NOT depend on output_stream in every single test to print the failure before aborting. Spectests rely on getting a spectest module which just has a bunch of prints in it; maybe we could do something similar. Doesn't matter that much in say Deno's current test-suite; I'm expecting us to support 100% but when we're testing specific modules as we'll end up doing here it'd be good if we cut this inter-module dependency. |
The wasmtime repo's WASI test suite uses Rust as the source language, which fails by panicking. It will print messages to stdout and eventually hit the This doesn't do you much good if your implementation doesn't have basic stuff like writing to stdio working, but in my experience that is usually the first thing you get working. |
I'd assume that be the case for any current tests; but with the new module split I'm not sure if that makes sense to do here. |
I think it make sense to assume stdout and the ability to set an exit code as a minimum bar for running any test. If we don't do that we would forced to create some other bespoke I/O system just for the test system (wouldn't we?). I'm not sure its worth the extra complexity to do that. |
It does make them more integration test like; which is fine just wanted to point it out as I'm unclear on where this moduralization that started last-year'ish is going (especially weak linkage). Deno and my test repos do this and it has worked out fine. Technically you don't actually even have to implement stdio (but it does have to be stubbed); failed assertions will always terminate with an unreachable instruction at the end regardless. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this up! This looks like what we talked about in the meeting, and a good start.
A test case takes the form of a binary \<basename\>.wasm file, next to that | ||
that there will always be a \<basename\>.\<ext\> source file from which the | ||
test was originally compiled from which can be used as a reference in the event | ||
of an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the simplicity of this, but I expect it'll be too limiting as we add more tests in more languages. It'll be convenient to build Rust source files with cargo, and other language source files with their respective build systems, and that'll often require additional support files, so it won't always be as simple as <basename>.<ext>.
What would you think of a convention where we have a sources
directory, and in it we have source code and build scripts that one can run to re-generate the wasm files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you think of a convention where we have a sources directory, and in it we have source code and build scripts that one can run to re-generate the wasm files?
Some of the WebAssembly proposals have a meta
directory that contain wast generators; might want to keep with that convention?
Would still like to see the "final" source files next to the test binaries as part of the build for reference as more often than not assertion errors don't make much sense without looking at the source.
Slight duplication but copying it back out to the parent directory next to the binary as part of the build means the implementor running the tests doesn't have to care how a particular proposal structured the build system.
In-fact, this way the meta directory could just be omitted from an implementation's test-data folder entirely if they have no need for it.
docs/Testing.md
Outdated
|
||
- Prepare inputs | ||
- Given an `<input>.wasm` file; take the `<basename>` of said file. | ||
- If `<basename>.<arg>` exists; take the program arguments from said file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a minor detail, but I'd find this more clear with ".args" instead of ".arg".
Also, we should document the whitespacing convention in the arguments file. Can we say that arguments are separated by whitespace, and '"' quotes may be used to enclose an argument containing whitespace?
docs/Testing.md
Outdated
```bash | ||
# Usage: $1 <runtime> <path_to_binary.wasm> | ||
# $1 wasmtime proc_exit-success.wasm | ||
# $1 wasmer proc_exit-failure.wasm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know these are only examples, but in the interests of not wanting to exclude engines, we shouldn't mention either of these examples here. Would "Usage: $1 <runtime_cmd> <path_to_binary>.wasm" be clear enough, without the examples?
## Writing tests | ||
|
||
Any source language may be used to write tests as long as the compiler supports | ||
the wasm32-unknown-unknown target and is well-known and freely available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do tend to think we do want to allow tests to pull in libc. It does call proc_exit
and some of the stdio routines needlessly pull in fdstat_get
and perhaps other things, however bare WASI is really inconvenient to code to, so if we have to do everythin in bare WASI, it seems like we'd end with people writing fewer tests. It seems better to have more tests, even if some of them do call more functions than they strictly need to.
Based on my recent work and our discussions in yesterday's meeting I started drafting up the basis of a simple specification of a test runner and guidelines for authoring tests.