Allocations on multi threaded matrix operations (caused by BigFloats)

multithreading
memory-allocation

#1

I wrote a small test code that changes matrix elements and stores them in the “old” matrix, so no new allocations should be made.
I then compiled the code with @allocated to measure the allocations made:

        - function test(N)
    80080 	M = Array{BigFloat}(N,N)
        0 	for i = 1:N
        0 		for j = 1:N
        0 			M[i,j] = big(pi)	#spawn matrix elements
        - 		end
        - 	end
        - 
        - 	#Now change matrix elements
        0 	for i = 1:N
        0 		for j = 1:N
        0 			M[i,j] = BigFloat(2)*M[i,j]
        - 		end
        - 	end
        - end
        - 

As you can see, the code behaves as expected with only allocations made when the matrix is initialized.

However, when I add multi threading, new allocations are being made:

        - function test(N)
    80096 	M = Array{BigFloat}(N,N)
        0 	Threads.@threads for i = 1:N
     1176 		Threads.@threads for j = 1:N
   174744 			M[i,j] = big(pi)	#spawn matrix elements
        - 		end
        - 	end
        - 
        - 	#Now change matrix elements
        0 	Threads.@threads for i = 1:N
     1640 		Threads.@threads for j = 1:N
   773132 			M[i,j] = BigFloat(2)*M[i,j]
        - 		end
        - 	end
        - end
        - 

Can anyone explain this behavior and tell me how to bypass it?

Thanks in advance

EDIT:
To give you a quick update:

I tried to make my algorithm work by only using Arbs, to avoid getting new allocations by the BigFloats.

Unfortunately, this does not work because Arbs seem to not be made for this kind of usage.
They lose numerically stability during the matrix operations and get unprecise results (unless one puts the precision much higher than for BigFloats, but this makes the performance even worse than the BigFloat-allocations).

I would like to know if someone knows another way how I could make this type of algorithm work.

Maybe there is another multiprecision floating type or a way to preallocate the BigFloat calculations?


#2

Not directly related:
Nesting threaded for loops,
is not valid in julia 0.6 and can lead to incorrect results.
In 0.7 it is valid but it does nothing for the inner loop.
Removing it from the inner loop might save some allocations.


#3

Just looking at the allocations as reported by @time
in julia 0.6,
I don’t see any significant difference in number of allocations between the threaded versions and the nonthreaded versions.


#4

Yes, you are right, I did that correct in the real code that I am working with but forgot to do it in the example.

        - function test(N)
    80096 	M = Array{BigFloat}(N,N)
        0 	Threads.@threads for i = 1:N
        0 		for j = 1:N
        0 			M[i,j] = big(pi)	#spawn matrix elements
        - 		end
        - 	end
        - 
        - 	#Now change matrix elements
        0 	Threads.@threads for i = 1:N
        0 		for j = 1:N
  1113860 			M[i,j] = BigFloat(2)*M[i,j]
        - 		end
        - 	end
        - end
        - 

The second loop still shows the same behavior tho


#5

@time shows allocations for me too:

julia> @time test(100)
  0.090719 seconds (106.77 k allocations: 3.996 MiB)

#6

Creating BigFloat allocates.

Ref https://github.com/JuliaLang/julia/pull/10084


#7

Oh, is there a way to avoid this when one is forced to work with BigFloats?


#8

Perhaps @JeffreySarnoff knows some good alternatives?


#9

Do you really need to be using BigFloats?

julia> using DoubleFloats
julia> sqrt(Double64(2))
1.4142135623730950488016887242096816
jjulia> inv(Double64(MathConstants.golden))
6.1803398874989484820458683436563545e-01
julia> exp(ans) * cbrt(ans)
1.5803242493055311416008395194148359

#10

Yes, I have to calculate the determinant of an ill conditioned 2000x2000 matrix up to 2400 digits precision :frowning:


#11

Have you tried ArbFloats.jl?


#12

I have already implemented them because MPFR does not provide multi precision besselj functions of fractional order. I will try how it works if I change the matrix to this type.


#13

Just change the type of the numbers in the matrix, not the matrix itself.
I have not provided Bessel functions in that package.


#14

No, I am working with Nemo, but these are ArbFloats too, right?


#15

correct


#16

That works much better:

        - using ArbFloats
        - function test(N)
    80096 	M = Array{ArbFloat}(N,N)
        0 	Threads.@threads for i = 1:N
        0 		for j = 1:N
        0 			M[i,j] = ArbFloat(pi)	#spawn matrix elements
        - 		end
        - 	end
        - 
        - 	#Now change matrix elements
        0 	Threads.@threads for i = 1:N
        0 		for j = 1:N
    31458 			M[i,j] =2*M[i,j]
        - 		end
        - 	end
        - end
        - 

Thank you for the suggestion!

On the website it said “Much faster than BigFloat at precisions up to 3,500 bits (1050 digits)”
If I reach more than that, is the speed comparable to BigFloats or does it become extremely slow?


#17

worry not – it will not become slower than BigFloats the way you will use it.


#18

Okay great, and I will save the time of the allocations too!


#19

As a side note, you will have better performance if you change the nesting order of the loops to

for j= ...
    for i= ...
        M[i,j] = ...

Julia arrays are stored in Fortran order.


#20

Is there a way to convert Nemo.arb to ArbFloats.ArbFloat?
I need to multiply these two types and it gives me errors when I do that.