Allocations on multi threaded matrix operations (caused by BigFloats)

DavidBerghaus · July 28, 2018, 8:29am

I wrote a small test code that changes matrix elements and stores them in the “old” matrix, so no new allocations should be made.
I then compiled the code with @allocated to measure the allocations made:

        - function test(N)
    80080 	M = Array{BigFloat}(N,N)
        0 	for i = 1:N
        0 		for j = 1:N
        0 			M[i,j] = big(pi)	#spawn matrix elements
        - 		end
        - 	end
        - 
        - 	#Now change matrix elements
        0 	for i = 1:N
        0 		for j = 1:N
        0 			M[i,j] = BigFloat(2)*M[i,j]
        - 		end
        - 	end
        - end
        -

As you can see, the code behaves as expected with only allocations made when the matrix is initialized.

However, when I add multi threading, new allocations are being made:

        - function test(N)
    80096 	M = Array{BigFloat}(N,N)
        0 	Threads.@threads for i = 1:N
     1176 		Threads.@threads for j = 1:N
   174744 			M[i,j] = big(pi)	#spawn matrix elements
        - 		end
        - 	end
        - 
        - 	#Now change matrix elements
        0 	Threads.@threads for i = 1:N
     1640 		Threads.@threads for j = 1:N
   773132 			M[i,j] = BigFloat(2)*M[i,j]
        - 		end
        - 	end
        - end
        -

Can anyone explain this behavior and tell me how to bypass it?

Thanks in advance

EDIT:
To give you a quick update:

I tried to make my algorithm work by only using Arbs, to avoid getting new allocations by the BigFloats.

Unfortunately, this does not work because Arbs seem to not be made for this kind of usage.
They lose numerically stability during the matrix operations and get unprecise results (unless one puts the precision much higher than for BigFloats, but this makes the performance even worse than the BigFloat-allocations).

I would like to know if someone knows another way how I could make this type of algorithm work.

Maybe there is another multiprecision floating type or a way to preallocate the BigFloat calculations?

oxinabox · July 28, 2018, 9:47am

Not directly related:
Nesting threaded for loops,
is not valid in julia 0.6 and can lead to incorrect results.
In 0.7 it is valid but it does nothing for the inner loop.
Removing it from the inner loop might save some allocations.

oxinabox · July 28, 2018, 9:53am

Just looking at the allocations as reported by @time
in julia 0.6,
I don’t see any significant difference in number of allocations between the threaded versions and the nonthreaded versions.

DavidBerghaus · July 28, 2018, 10:54am

Yes, you are right, I did that correct in the real code that I am working with but forgot to do it in the example.

        - function test(N)
    80096 	M = Array{BigFloat}(N,N)
        0 	Threads.@threads for i = 1:N
        0 		for j = 1:N
        0 			M[i,j] = big(pi)	#spawn matrix elements
        - 		end
        - 	end
        - 
        - 	#Now change matrix elements
        0 	Threads.@threads for i = 1:N
        0 		for j = 1:N
  1113860 			M[i,j] = BigFloat(2)*M[i,j]
        - 		end
        - 	end
        - end
        -

The second loop still shows the same behavior tho

DavidBerghaus · July 28, 2018, 10:56am

@time shows allocations for me too:

julia> @time test(100)
  0.090719 seconds (106.77 k allocations: 3.996 MiB)

kristoffer.carlsson · July 28, 2018, 10:57am

Creating BigFloat allocates.

Ref WIP: Pooling BigInts and BigFloats by skariel · Pull Request #10084 · JuliaLang/julia · GitHub

DavidBerghaus · July 28, 2018, 11:00am

Oh, is there a way to avoid this when one is forced to work with BigFloats?

kristoffer.carlsson · July 28, 2018, 11:03am

Perhaps @JeffreySarnoff knows some good alternatives?

JeffreySarnoff · July 28, 2018, 11:22am

Do you really need to be using BigFloats?

julia> using DoubleFloats
julia> sqrt(Double64(2))
1.4142135623730950488016887242096816
jjulia> inv(Double64(MathConstants.golden))
6.1803398874989484820458683436563545e-01
julia> exp(ans) * cbrt(ans)
1.5803242493055311416008395194148359

DavidBerghaus · July 28, 2018, 11:29am

Yes, I have to calculate the determinant of an ill conditioned 2000x2000 matrix up to 2400 digits precision

JeffreySarnoff · July 28, 2018, 11:30am

Have you tried ArbFloats.jl?

DavidBerghaus · July 28, 2018, 11:31am

I have already implemented them because MPFR does not provide multi precision besselj functions of fractional order. I will try how it works if I change the matrix to this type.

JeffreySarnoff · July 28, 2018, 11:33am

Just change the type of the numbers in the matrix, not the matrix itself.
I have not provided Bessel functions in that package.

DavidBerghaus · July 28, 2018, 11:39am

No, I am working with Nemo, but these are ArbFloats too, right?

JeffreySarnoff · July 28, 2018, 11:39am

correct

DavidBerghaus · July 28, 2018, 12:10pm

That works much better:

        - using ArbFloats
        - function test(N)
    80096 	M = Array{ArbFloat}(N,N)
        0 	Threads.@threads for i = 1:N
        0 		for j = 1:N
        0 			M[i,j] = ArbFloat(pi)	#spawn matrix elements
        - 		end
        - 	end
        - 
        - 	#Now change matrix elements
        0 	Threads.@threads for i = 1:N
        0 		for j = 1:N
    31458 			M[i,j] =2*M[i,j]
        - 		end
        - 	end
        - end
        -

Thank you for the suggestion!

On the website it said “Much faster than BigFloat at precisions up to 3,500 bits (1050 digits)”
If I reach more than that, is the speed comparable to BigFloats or does it become extremely slow?

JeffreySarnoff · July 28, 2018, 12:24pm

worry not – it will not become slower than BigFloats the way you will use it.

DavidBerghaus · July 28, 2018, 12:30pm

Okay great, and I will save the time of the allocations too!

favba · July 28, 2018, 12:37pm

As a side note, you will have better performance if you change the nesting order of the loops to

for j= ...
    for i= ...
        M[i,j] = ...

Julia arrays are stored in Fortran order.

DavidBerghaus · July 28, 2018, 12:38pm

Is there a way to convert Nemo.arb to ArbFloats.ArbFloat?
I need to multiply these two types and it gives me errors when I do that.