Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vcpkg triplets for ASAN. #4515

Merged
merged 24 commits into from
Dec 22, 2023
Merged

Conversation

teo-tsirpanis
Copy link
Member

@teo-tsirpanis teo-tsirpanis commented Nov 13, 2023

SC-34454

This PR defines x64-windows-asan, x64-linux-asan, x64-osx-asan and arm64-osx-asan triplets, that pass compiler options to enable Address Sanitizer. To enable ASAN you have to specify the TILEDB_VCPKG_BASE_TRIPLET variable (or the related bootstrap script options) to the triplet without the -asan suffix.

As an attempt to reduce duplication, I tried passing the flags with a CMake toolchain file, but could not make it work.


TYPE: BUILD
DESC: Add vcpkg triplets for Address Sanitizer.

Copy link

This pull request has been linked to Shortcut Story #34454: Pass sanitizer flags into vcpkg triple.

@teo-tsirpanis teo-tsirpanis marked this pull request as ready for review November 13, 2023 23:48
@teo-tsirpanis teo-tsirpanis marked this pull request as draft November 13, 2023 23:49
@teo-tsirpanis teo-tsirpanis marked this pull request as ready for review November 14, 2023 17:08
@teo-tsirpanis
Copy link
Member Author

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We need a separate module Sanitizers.cmake as in Rewrite sanitizer invocation #2392. The present state of the PR has sanitizer-relevant code in two places rather than one. One of those places is clearly labelled "HACK". We should clean this up as part of this PR.
    • What we're doing with the address sanitizer here is the beginning of something rather larger. We'll need similar code for dealing with the other sanitizers over time.
  • Need a fatal error if the sanitizer isn't "address".
  • Need to incorporate the documentation present in Rewrite sanitizer invocation #2392, as well as the user-friendly error messages.
  • cmake/Options/TileDBToolchain.cmake needs a copyright notice.

CMakeLists.txt Outdated
@@ -264,7 +264,9 @@ if (SANITIZER)
message(STATUS "The TileDB library is compiled with sanitizer ${SANITIZER} enabled")
endif()
add_compile_options(-DTILEDB_SANITIZER=${SANITIZER})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At present this check runs after TileDBToolchain.cmake has already run. We should validate the sanitizer arguments before we define the triplets. Please crib from #2392 for user-friendly messages.

We need to only allow the address sanitizer through at present, because it's the only one supported. That means a fatal error message for now.

In addition, it seems wrong to set some of the flags only here and some in the triplet files.

At some point after this PR, we need to investigate how to support the other sanitizers cleanly. It might be possible to just turn them on, but we should do them separately so that we take the opportunity to (1) validate them independently and (2) add CI jobs so that they stay validated indefinitely. We already have a CI job for the address sanitizer, so it didn't need new code; the others will.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point after this PR, we need to investigate how to support the other sanitizers cleanly.

One idea to explore at a later time is to generate a synthetic triplet file at configure time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we might do is only to support sanitizers if the triplet is predefined and not derived from a script. "Predefined" could mean any of these:

  • -D on the command line
  • Present as a CMake cache variable
  • environment variable (maybe)
    Definition on the command line would suffice for CI. Using the CMake cache would allow developers an incremental method of configuration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can add CMake presets for ASAN and the rest of the sanitizers we might support.

set(VCPKG_TARGET_TRIPLET "x64-macos")
elseif (CMAKE_OSX_ARCHITECTURES STREQUAL arm64 OR CMAKE_SYSTEM_PROCESSOR MATCHES "^aarch64" OR CMAKE_SYSTEM_PROCESSOR MATCHES "^arm")
set(VCPKG_TARGET_TRIPLET "arm64-macos")
if(NOT DEFINED VCPKG_TARGET_TRIPLET AND SANITIZER STREQUAL "address")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an odd way of getting these triplets defined. Is there some way of using the triplet detection that's in vcpkg already? If we could call that, get a triplet then all we would need is to append "-asan" to it.

Copy link
Member Author

@teo-tsirpanis teo-tsirpanis Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way of using the triplet detection that's in vcpkg already?

Unfortunately not. The logic is laid out in vcpkg.cmake without an enclosing function. Moreover the vcpkg files are not accessible at this point.

# For known compilers, check that the sanitizer is supported.
# If we know it's not supported, we'll fail so that we avoid false confidence.
# If we don't know, we'll warn that it might not work.
if (CMAKE_CXX_COMPILER_ID MATCHES "MSVC")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this file gets imported before the project() command, CMAKE_CXX_COMPILER_ID is not yet assigned. I see two ways to solve this:

  • Remove the error messages and use generator expressions to apply the options only to specific compilers (my preference).
  • Move the Sanitizer module after the project() command. Validation would run much later, after vcpkg has built the dependencies, which we might not want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by using generator expressions, and adding a macro that performs further validation, and gets called after the project() command.

set(VCPKG_TARGET_TRIPLET "x64-osx")
elseif (CMAKE_OSX_ARCHITECTURES STREQUAL arm64 OR CMAKE_SYSTEM_PROCESSOR MATCHES "^aarch64" OR CMAKE_SYSTEM_PROCESSOR MATCHES "^arm")
set(VCPKG_TARGET_TRIPLET "arm64-osx")
if(NOT DEFINED VCPKG_TARGET_TRIPLET AND SANITIZER STREQUAL "address")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitizer-relevant code in two places rather than one

The Sanitizer module has the code for setting the compiler options when building our own code, and TileDBToolChain has the code for choosing the vcpkg triplet to use. I would prefer these two to stay in separate places, especially as the triplet selection might become more complex in the future.

@teo-tsirpanis
Copy link
Member Author

The triplet detection logic was replaced with the following. If ASAN is enabled:

  • If VCPKG_TARGET_TRIPLET is not specified, fail.
  • If VCPKG_TARGET_TRIPLET is specified, and does not end with -asan:
    • If a triplet called ${VCPKG_TARGET_TRIPLET}-asan exists in ports/triplets, use that.
    • Otherwise fail.

The bootstrap scripts got options to specify the vcpkg triplet. I also updated CONTRIBUTING.md with instructions to build with sanitizers.

Let me know what you think.

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface to the build system does not meet the requirements. In its current state it's essentially hardcoded to work for CI but not for other users.

  • Need sanitizer option on CMake command line.
    • CI needs to put this option on its command line.
    • The bootstrap scripts need to generate this option for the command line.

There's missing documentation, both user-facing and internal.

  • User-facing. Specifically, we need to be very clear about the fact that sanitizer builds will rebuild dependencies and should go in their own build directory. We do not have any existing place to incrementally add this documentation. I suggesting creating a new file /doc/Build.md. Instructions to user should minimize the amount of vcpkg knowledge required. The only thing required here is how triplets work.
  • Internal. There's no explanation of how the system works as a whole in Sanitizer.cmake.

CONTRIBUTING.md Outdated
@@ -61,6 +61,19 @@ Formatting conventions:
- comments are good, TileDB uses [doxygen](http://www.stack.nl/~dimitri/doxygen/manual/docblocks.html) for class doc strings.
- format code using [clang-format](https://clang.llvm.org/docs/ClangFormat.html)

### Building with sanitizers

TileDB can be built with [clang sanitizers](https://clang.llvm.org/docs/AddressSanitizer.html) enabled. To enable them, you have to bootstrap with the `--enable-sanitizer` flag, as well as specify a vcpkg triplet compatible with ASAN (that ends with `-asan`). You can view the list of supported triplets [here](https://github.com/TileDB-Inc/TileDB/tree/dev/ports/triplets).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the interface we talked about. The command line need to take the base triplet, the one without the suffix -asan. We'll construct the triplet internally.

The reason for this was to get consistent triplet names between different users and usages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a base triplet option. It is used only if a sanitizer is enabled.

bootstrap Outdated
@@ -34,6 +34,8 @@ Configuration:
--help print this message
--prefix=PREFIX install files in tree rooted at PREFIX
['"${default_prefix}"']
--vcpkg-target-triplet=TRIPLET vcpkg triplet to use for downloading and building dependencies
[auto-detected]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"vcpkg base triplet"

bootstrap Outdated
Comment on lines 128 to 129
--vcpkg-target-triplet=*) vcpkg_target_triplet=`arg "$1"`
vcpkg_target_triplet="-DVCPKG_TARGET_TRIPLET=${vcpkg_target_triplet}";;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we talked about using the name "base triplet" in the option name to remove ambiguity.

This option is useful, but should go in a separate PR. It will need to be deconflicted with the base triplet specification.

CMakeLists.txt Outdated
add_compile_options(-DTILEDB_SANITIZER=${SANITIZER})
add_compile_options(-g -fno-omit-frame-pointer -fno-optimize-sibling-calls -fsanitize=${SANITIZER})
add_link_options(-fsanitize=${SANITIZER})
if(SANITIZER)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you run a manual test to see if this code ever runs? It looks like this variable is never set anywhere, which means none of the code guarded by it will ever run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The SANITIZER variable is set from the command line. I verified that the triplet automatically switches to ASAN and the correct compile options are present in the generated Ninja file.

We don't have it indeed in BuildOptions, thanks for spotting it!

@@ -101,7 +101,7 @@ jobs:
matrix_compiler_cc: 'gcc-10'
matrix_compiler_cxx: 'g++-10'
timeout: 120
bootstrap_args: '--enable-serialization'
bootstrap_args: '--enable-serialization --vcpkg-target-triplet=x64-linux-asan'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test subverts any test of the actual bootstrap interface by not using --enable-sanitizer and not specifying the base triplet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 47 to 49
if(NOT SANITIZER)
return()
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the code in this module runs because SANITIZER is never true.

We need a sanitizer option declared in cmake/Options/BuildOptions.cmake and the bootstrap script needs to use it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does run, see the other comment. I will add it in BuildOptions, thanks for spotting it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added build option. I also renamed SANITIZER to TILEDB_SANITIZER (I guessed we can do it since the CMake variable is undocumented, but I can revert the commit if we can't).

@teo-tsirpanis
Copy link
Member Author

Thanks for the review, it is now clearer to me what to do.

sanitizer builds will rebuild dependencies and should go in their own build directory

I am fairly sure (will verify soon) that this is not required. Vcpkg will automatically rebuild the dependencies with the correct options once it sees that the triplet changed, and so will CMake for our own code once the compiler options change.

@teo-tsirpanis
Copy link
Member Author

I am fairly sure (will verify soon) that this is not required.

BTW vcpkg indeed reinstalls the dependencies but reconfiguring fails without deleting the cache because some paths point to the old ASAN triplet.

@teo-tsirpanis teo-tsirpanis mentioned this pull request Dec 11, 2023
Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should commit this now, but also not close out the story.

It only compiles some of the targets with the sanitizer. In particular, the object modules and unit tests are missing. Nothing in this PR is wrong insofar as it goes, but it doesn't go far enough.

cmake/Modules/Sanitizer.cmake Outdated Show resolved Hide resolved
Comment on lines +601 to 606
if (TILEDB_SANITIZER)
target_link_libraries(TILEDB_CORE_OBJECTS_ILIB
INTERFACE
-fsanitize=${SANITIZER}
-fsanitize=${TILEDB_SANITIZER}
)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change only affects some of the sources and targets, not all of them. It only affects the library target, and only the sources in the big source list, which doesn't include anything in the object libraries. At present this is only baseline, but that'll change soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually pass -fsanitize globally so this block could just be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, that was for the compiler, but we also pass it for the linker (MSVC's linker does not have such option). Feel free to remove this block if you see fit. I am on PTO and cannot verify it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will close the story but added this as a follow up.
https://app.shortcut.com/tiledb-inc/story/38662/follow-up-after-vcpkg-triple-pr

@KiterLuc KiterLuc merged commit 7b0a4ee into TileDB-Inc:dev Dec 22, 2023
61 checks passed
@teo-tsirpanis teo-tsirpanis deleted the asan-triplets branch December 22, 2023 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants