Strange error when enabling multiple threads

I have some code that fails with " ERROR: UndefVarError: T not defined" when I run julia with multiple threads and works fine otherwise.
The complete details can be found in https://github.com/invenia/Impute.jl/issues/124 , but I am not so sure any longer if this is a bug in the package Impute or in Julia itself.

The error does NOT happen when I launch Julia with the command:
julia --project
It DOES happen when I launch it with the command:
julia -t auto --project

Any hints?

The problem does not appear with small data sets, only with large data sets. I am providing the complete data set needed to reproduce the bug in a gist .

Just looks like a bug in the package.

Question: Is it possible to change the number of threads that Julia uses at runtime?

If that would be possible I could reduce the number of threads before calling the problematic function to one and increase them afterwards.

Or is there another way to force a function to use only one thread?

1 Like

Additional finding: If I add the parameter ntasks=1 to the CSV.read call like this:

df_new = CSV.read("data/" * logfile, DataFrame; ntasks=1)

no error occurs even if Julia itself has 8 threads available.

So perhaps the bug is in the CSV package and not in the Impute package?

For me it looks as if CSV.read creates different output depending on the number of tasks it is using, and the different output causes Impute to crash.

Not right now.

The stacktrace you posted shows that it is very much in Impute.

This bug can happen when https://github.com/invenia/Impute.jl/blob/d08c5069f71507e4fee5e49e90f97c9527586a69/src/imputors/interp.jl#L41 is called with data::AbstractVector{Missing}, in which case T would not be defined because there is no other eltype than Missing for that vector type. This is thus definitely not a multithreading bug, but a bug in how that function is defined; if the eltype(data) === Missing, then something else needs to happen than trying to call T(...).