Benchmark tools breaks when using "Threads.@threads"?

Hey guys!

I am trying to benchmark a function in a package I made. The package can be found here, https://github.com/AhmedSalih3d/PostSPH.jl, and the function inside of it I am testing is readVtkArray. When I run the function alone everything works fine, but if I try to do anything with @benchmark or @btime I see:

using PostSPH
using Benchmarkstools
cd(raw"path-with-vtk-files")
@btime k = readVtkArray("parts",Cat(2))

I get the error:

Error in file number 3

Error thrown in threaded loop on thread 1: MethodError(f=typeof(Base.convert)(), args=(Array{Float32, N} where N, 1.#QNAN), world=0x0000000000006420)Error in file number 6

Error thrown in threaded loop on thread 2: MethodError(f=typeof(Base.convert)(), args=(Array{Float32, N} where N, 1.#QNAN), world=0x0000000000006420)

Which does not occur, if I run with @time or nothing at all. There I get as expected:

k = readVtkArray("parts",Cat(2))
11-element Array{Array{Float32,N} where N,1}:
 [0.0 0.0 0.0; 0.0 0.0 0.0; … ; 0.0 0.0 0.0; 0.0 0.0 0.0]
 [0.0 0.0 0.0; 0.0 0.0 0.0; … ; 0.00718685 0.0 0.00367148; -0.00353015 0.0 4.91067e-6]
 [0.0 0.0 0.0; 0.0 0.0 0.0; … ; -0.000894189 0.0 0.000899359; 0.000442034 0.0 0.00166281]
 [0.0 0.0 0.0; 0.0 0.0 0.0; … ; -0.000594841 0.0 -0.00052736; -0.000520424 0.0 0.00080863]
 [0.0 0.0 0.0; 0.0 0.0 0.0; … ; -0.000276363 0.0 -0.000111521; -0.000175042 0.0 0.000154482]
 [0.0 0.0 0.0; 0.0 0.0 0.0; … ; -5.00944e-5 0.0 -0.00012329; -0.000177598 0.0 6.39014e-5]
 [0.0 0.0 0.0; 0.0 0.0 0.0; … ; -3.33737e-5 0.0 -4.81802e-5; -0.000104715 0.0 7.41634e-6]
 [0.0 0.0 0.0; 0.0 0.0 0.0; … ; -0.000103613 0.0 -4.02584e-5; -9.69871e-5 0.0 2.61577e-5]
 [0.0 0.0 0.0; 0.0 0.0 0.0; … ; -4.87995e-5 0.0 -4.44328e-5; -6.59069e-5 0.0 5.18157e-6]
 [0.0 0.0 0.0; 0.0 0.0 0.0; … ; -4.91999e-5 0.0 -2.78963e-5; -1.20714e-5 0.0 1.5948e-5]
 [0.0 0.0 0.0; 0.0 0.0 0.0; … ; -6.2435e-6 0.0 2.14577e-5; 1.46149e-5 0.0 -1.46728e-5]

I don’t know how to a minimal working example for this, so I made a dropbox folder with some .vtk files; Dropbox - JuliaDropbox - Simplify your life, if anyone wants to test for themselves. The Github link contains a read me of how to install and source code in src.

A .vtk file is a visual toolkit file which is used to visualize for an example simulations in Paraview. If anyone could point me to why this would occur, when using “Threads.@threads” I would be very happy.

Kind regards

Are you using IO in the threaded loop?

Yes, atleast I think so. Inside this code snippet:

k = Vector{Array{catType[typ]}}(undef, nFilenames)
            Threads.@threads for i = 1:nFilenames::Number
                try
                    @inbounds k[i] = readVtk(filenames[i], typ,PosTyp)
                catch
                    #Since DualSPHysics starts from 0000 - Test
                    println("Error in file number ",i-1)
                    @inbounds k[i] = NaN
                end

I use threads on the for loop in which I call “readVtk” which opens an IOStream.

IO in threaded regions are AFAIU not supported.

Ah okay, how would I go about benchmarking then? Sometimes benchmarking has worked if the files have been bigger, but seems weird?.. Using @time and documenting?

Kind regards

Your problem is likely not related to benchmark. You are using the thread, not the benchmark.

It’s also likely not IO related. For one it’ll usually crash. This particular printing also won’t run unless you have an error so it won’t affect working code.

You most likely have another race condition or but in your code. The benchmark code simply runs the code many more times than you usually do that exposes the bug.

1 Like

Are you sure? Couldn’t it be because the benchmarking tool tries to run the same instance of code at multiple times? And therefore it ends up reading from same file multiple times?

Kind regards

1 Like

Well, only you know your code so if you cannot run your code multiple times then you cannot use benchmark tools.

In any case, that’s still unrelated to the interaction between threading and benchmark.

In theory it is unrelated I guess, but I expected that to benchmark it would run my code once, finish everything, then start over, finish everything and so on. Seems like it just spawns multiple instances and then tries to run them all at once, and that is why my code would break - as far as I understand.

Then I guess making a for loop with @time and saving maybe 50 iterations would give a good estimate?

Everything seems to work on my machine:

sh$ JULIA_NUM_THREADS=4 julia

julia> include("readVtk_readbytesslow.jl")
readVtkArray (generic function with 1 method)

julia> for _ in 1:10_000
         readVtkArray("parts")
       end

julia> using BenchmarkTools
julia> @btime readVtkArray("parts");
  5.204 ms (454 allocations: 994.30 KiB)

Could you run the same kind of tests and report whether you have errors when stress-testing with @btime and/or a simple for loop? And maybe let the number of threads vary if you can?

No that is not happening. As I said, all the threads are in your code.

1 Like

Hmmm, okay I guess that Threads.@threads might not be the problem then… what you are running is an older version of my code simplified to only read one specific array type. I will try testing different array types now, thanks!

@ffevotte I tested the same array now with my PostSPH package and it still gives an error for me. Would you kindly check with my Github package? The command is just:

@btime k = readVtkArray("parts",PostSPH.Idp)

What you are saying makes sense to me now, thanks for explaining.

Yep, this version sometimes fails. And a simple for loop is enough to trigger the error:

$ JULIA_NUM_THREADS=4 julia

julia> using PostSPH

julia> for _ in 1:10_000
         k = PostSPH.readVtkArray("parts",PostSPH.Idp)
       end
Error in file number 0

Error thrown in threaded loop on thread 0: MethodError(f=typeof(Base.convert)(), args=(Array{Int32, N} where N, nan), world=0x00000000000063ea)Error in file number 3

Error thrown in threaded loop on thread 1: MethodError(f=typeof(Base.convert)(), args=(Array{Int32, N} where N, nan), world=0x00000000000063ea)Error in file number 3

signal (11): Segmentation fault
in expression starting at no file:0
unknown function (ip: 0x7f5a3323a63d)
unknown function (ip: 0xcd6dae54cd0ef76c)
Allocations: 11194468 (Pool: 10923590; Big: 270878); GC: 142
Segmentation fault

This confirms that, as @yuyichao said, the error is not related to @btime. And hopefully the differences between the “old” and “new” versions gives hints as to where to look for an error…

Yeah, just went looking through my code and it seems like the difference might be in the Github version I use “readuntil” and “read”, while in the bare minimum file on dropbox, I use “readuntil” and “readbytes”. I chose the first approach since it is much faster. I have a hard time spotting other major differences - atleast I know now that it is not Threads.@threads which made the error.

Thanks for your time, don’t really understand how it can work properly without benchmarking and then suddenly do this when I start to benchmark, but maybe in a few weeks I will find out why.

Kind regards

Okay, I went and checked on a different data set and now it works:

@benchmark k = readVtkArray("PartAll",PostSPH.Points)
BenchmarkTools.Trial:
  memory estimate:  643.19 MiB
  allocs estimate:  24552
  --------------
  minimum time:     418.663 ms (0.00% GC)
  median time:      447.024 ms (0.00% GC)
  mean time:        562.856 ms (21.67% GC)
  maximum time:     915.151 ms (43.00% GC)
  --------------
  samples:          9
  evals/sample:     1

Seems like there was an error in the Dropbox files. Why the code in Dropbox did not fail might be because readbytes will always give some output, compared to the one on Github. Thanks for your time guys, sorry that it was a dumb mistake at the end, I learned a lot :slight_smile:

Let me try to explain this once more: your code is probably flawed, in such a way that it fails once every 10_000 runs because of a race condition between threads, that happens only rarely. Your code does not “work properly”. It happens to seem to work most of the time when you use it regularly.

The easiest way to evidence the race condition consists in running your code a large number of times to trigger the chain of events that leads to the race condition. This is something that happens to be done by @btime, but that you can also reproduce with a simple for loop.

2 Likes