Thread sanitizer troubles

I’m wondering if anyone has experience with building Julia with thread sanitization (TSAN) before? I’ve tried maybe ~15 clean builds over the last year now, with different configurations after being recommended it (for this issue) but have never had luck. I’m just wondering if someone could share an exact Make.user file that they have been successful with.

The documentation here: Sanitizer support · The Julia Language gives only a few tips on building TSAN. In theory you should be able to run ./contrib/tsan/build.sh /tmp/julia -j 20. However this never seems to work on any Julia release I’ve tried.

I have also tried building manually with various permutations of build flags, such as my current setup:

TOOLCHAIN=/home/mc2473/juliasanitizer/toolchain/usr/tools
BINDIR=$(TOOLCHAIN)/usr/bin
TOOLDIR=$(TOOLCHAIN)/usr/tools

override CC=$(TOOLCHAIN)/clang
override CXX=$(TOOLCHAIN)/clang++

export TSAN_SYMBOLIZER_PATH=$(TOOLCHAIN)/llvm-symbolizer
export TSAN_OPTIONS="suppressions=/home/mc2473/juliasanitizer/julia/tsan_suppressions.sup"

USECLANG=1
USE_BINARYBUILDER_LLVM=1

override SANITIZE=1
override SANITIZE_THREAD=1
override WITH_GC_DEBUG_ENV=0
override JULIA_PRECOMPILE=0

export LBT_USE_RTLD_DEEPBIND=0

but I often seem to get variations of the following during the make process:

ThreadSanitizer: CHECK failed: sanitizer_allocator_secondary.h:297 "((IsAligned(p, page_size_))) != (0)" (0x0, 0x0)
 (tid=1625251)
    #0 __tsan::CheckUnwind() /workspace/srcdir/llvm-project/compiler-rt/lib/tsan/rtl/tsan_rtl.cpp:672 (julia-debug+
0xc9a85)
    #1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /workspace/s
rcdir/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:86 (julia-debug+0x41d8b)
    #2 __sanitizer::LargeMmapAllocator<__tsan::MapUnmapCallback, __sanitizer::LargeMmapAllocatorPtrArrayDynamic, __
sanitizer::LocalAddressSpaceView>::GetHeader(unsigned long) /workspace/srcdir/llvm-project/compiler-rt/lib/tsan/rtl
/../../sanitizer_common/sanitizer_allocator_secondary.h:297 (julia-debug+0xc720c)
    #3 __sanitizer::LargeMmapAllocator<__tsan::MapUnmapCallback, __sanitizer::LargeMmapAllocatorPtrArrayDynamic, __sanitizer::LocalAddressSpaceView>::GetHeader(void const*) /workspace/srcdir/llvm-project/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_secondary.h:301 (julia-debug+0xc720c)
    #4 __sanitizer::LargeMmapAllocator<__tsan::MapUnmapCallback, __sanitizer::LargeMmapAllocatorPtrArrayDynamic, __sanitizer::LocalAddressSpaceView>::Deallocate(__sanitizer::AllocatorStats*, void*) /workspace/srcdir/llvm-project/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_secondary.h:135 (julia-debug+0xc720c)
    #5 __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__tsan::AP64>, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Deallocate(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__tsan::AP64>>*, void*) /workspace/srcdir/llvm-project/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_combined.h:94 (julia-debug+0xc720c)
    #6 __tsan::user_free(__tsan::ThreadState*, unsigned long, void*, bool) /workspace/srcdir/llvm-project/compiler-rt/lib/tsan/rtl/tsan_mman.cpp:214 (julia-debug+0xc720c)
    #7 free /workspace/srcdir/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:727 (julia-debug+0x69245)

I’ve hit various segfaults during the build (Segfault on building TSAN-enabled Julia · Issue #48031 · JuliaLang/julia · GitHub) and runtime (Thread sanitization issues with garbage collection · Issue #52690 · JuliaLang/julia · GitHub) which I still don’t quite understand.

So I’m wondering if someone could share any tips on building with TSAN. I’ve been chasing some heisenbugs for maybe a year now and am losing my mind a bit…

2 Likes

It would be awesome to have a TSAN-enabled build available in juliaup… Thread safety is an area where I feel particularly on my own currently with Julia.

2 Likes

A downloadable version of TSAN-enabled julia on juliaup would be -amazing-.

@vchuravy and @Keno did mention something in this direction here:

We did add ABI platform tags for tsan and asan when we added the msan ones, so in theory it’s all set, but someone needs to drive pushing it through.

But I’m not sure how easy it is to go that final step

1 Like

I wonder if there are also any pure-Julia thread sanitization libraries available? Feels like you could get better debugging info that way anyways…

For example, you could have a Julia library that when imported, explicitly pirates and overwrites some of the allocation methods for Array or maybe mutable structs in general. Those overwritten versions could have a thread lock on reads and writes — if a thread needs to wait for it to unlock, that indicates there’s a data race, and the backtrace would indicate where!

Anything is better than nothing I guess :slight_smile: but I’d really prefer coverage of all data types at the price of poorer error messages.

It would be nice to have something like the Go race detector. It’s integrated in the standard toolchain and works for all data types. I think it’s actually based on ThreadSanitizer.

1 Like

Cool!

Yeah, for example, you could have this sanitizer package used like this:

using ThreadSanitizer

@sanitize_base  # Use when debugging

which would overwrite all of these methods in boot.jl:

Array{T,1}(::UndefInitializer, m::Int) where {T} =
    ccall(:jl_alloc_array_1d, Array{T,1}, (Any, Int), Array{T,1}, m)
Array{T,2}(::UndefInitializer, m::Int, n::Int) where {T} =
    ccall(:jl_alloc_array_2d, Array{T,2}, (Any, Int, Int), Array{T,2}, m, n)
Array{T,3}(::UndefInitializer, m::Int, n::Int, o::Int) where {T} =
    ccall(:jl_alloc_array_3d, Array{T,3}, (Any, Int, Int, Int), Array{T,3}, m, n, o)
Array{T,N}(::UndefInitializer, d::Vararg{Int,N}) where {T,N} =
    ccall(:jl_new_array, Array{T,N}, (Any, Any), Array{T,N}, d)
Array{T,N}(::UndefInitializer, d::NTuple{N,Int}) where {T,N} =
    ccall(:jl_new_array, Array{T,N}, (Any, Any), Array{T,N}, d)

which look to be the primitive calls for all the array initializations.

You could simply overwrite them to create a ReentrantLock() for each new array (or array element?) which deleted when the array is garbage collected. Then for the getindex and setindex methods, you would check those locks and throw an thread sanitizer error if the user is trying to access them while locked (=being read/written by other thread). Of course this is expensive, but thread race issues are so painful, it would be worth it (and valgrind takes forever anyways).

However I’m not sure what the equivalent is for mutable types in general?


I’d be curious to hear what, e.g., @vchuravy @jameson or @Keno think… I should emphasize this would only be a debugging tool, as an alternative to the extremely hard to compile thread sanitizer (whose Julia debugging info is perhaps not great for Julia code anyways?). You would never ever use this in a library.

Did you manage to fix the ThreadSanitizer: CHECK failed: sanitizer_allocator_secondary.h:297 error?

Unfortunately no, I gave up.

1 Like