Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tests] Extend fuzzer #2454

Open
Tracked by #2453
Rot127 opened this issue Aug 22, 2024 · 4 comments
Open
Tracked by #2453

[Tests] Extend fuzzer #2454

Rot127 opened this issue Aug 22, 2024 · 4 comments
Labels
enhancement Testing Test related issue

Comments

@Rot127
Copy link
Collaborator

Rot127 commented Aug 22, 2024

The fuzzer could need some extensions. For example, it determines the mode and arch by comparing the strings of the enum identifiers CS_ARCH and CS_MODE. This makes it too maintenance heavy. And easy to forget to add new identifiers there.

It would be better to:

  • have it use the test_mapping.h file (from Modern Testing #2456). To get the mode and arch values for fuzzing. Or something similar centralized.
  • Allow to track fuzzing coverage of the source code.
  • Consume the yaml test files for fuzzer input generation. Not the legacy MC/*.cs files. (see Modern Testing #2456)
  • Remove dead code like travis.yaml
  • Use cmake build AND use it in the OSSFuzz build script (removing hard-coded paths)
  • Don't use the Python bindings to generate the corpus. It should simply parse the yaml files and get the binry info from somewhere else. Currently the fuzzer build step compiles Capstone multiple times.
  • LoongArch is not fuzzed
  • Missing modes or architectures are currently just ignored. But it should fail if this is the case.

cc @catenacyber Because you did most work on it in the past :)

@Rot127 Rot127 added enhancement Testing Test related issue labels Aug 22, 2024
@catenacyber
Copy link
Contributor

Thanks for pinging me Rot127. Are you planning on doing this ?
Just putting come comments inline

The fuzzer could need some extensions. For example, it determines the mode and arch by comparing the strings of the enum identifiers CS_ARCH and CS_MODE. This makes it too maintenance heavy. And easy to forget to add new identifiers there.

Sure

It would be better to:

  • have it use the test_mapping.h file (from Modern testing #2384). To get the mode and arch values for fuzzing. Or something similar centralized.

Looks a good idea

  • Allow to track fuzzing coverage of the source code.

What do you mean ?

  • Consume the yaml test files for fuzzer input generation. Not the legacy MC/*.cs files. (see Modern testing #2384)

Do the yaml cover all the previous MS/*.cs cases ?

  • Remove dead code like travis.yaml

Looks good

  • Use cmake build AND use it in the OSSFuzz build script (removing hard-coded paths)

Looks good

  • Don't use the Python bindings to generate the corpus. It should simply parse the yaml files and get the binry info from somewhere else. Currently the fuzzer build step compiles Capstone multiple times.

Where is this somewhere else ?

  • LoongArch is not fuzzed

There should be a generic way to create a fuzz target for each architecture indeed...

  • Missing modes or architectures are currently just ignored. But it should fail if this is the case.

Ok

@catenacyber
Copy link
Contributor

cf https://storage.googleapis.com/oss-fuzz-coverage/capstone/reports/20240821/linux/src/capstonenext/report.html for the coverage

@Rot127
Copy link
Collaborator Author

Rot127 commented Aug 26, 2024

Do the yaml cover all the previous MS/*.cs cases ?

Yes! And even way more now. Because we can generate them more reliably from the LLVM regression tests.

Allow to track fuzzing coverage of the source code.

What do you mean ?

Basically what you linked in your second comment. Thanks for that! I wasn't aware of it.

Where is this somewhere else ?

The modern testing is not yet merged. See #2456. There I defined a test_mapping.h file. It maps a given enumeration identifier to its value. We could compile an executable from it and expose an API for the bindings.

But I haven't thought this to an end yet. But in general it would be good to have a single point, which does the enumeration identifier -> value mapping. Currently this behavior is all over the place (const_generator.py, Python bindings, in the old cstest code, the fuzzer etc.).
Everyone defined a new table with the mapping they need.

Are you planning on doing this ?

It is not a high priority. But as you probably recognized we heavily modernize Capstone currently. So I also open up issues about issues to not loose track of them.
If you want to help of course, feel free to do so :D

@catenacyber
Copy link
Contributor

Thanks for the answers, it is not high priority for me neither ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Testing Test related issue
Projects
None yet
Development

No branches or pull requests

2 participants