Solve a system of linear equations many times without allocating memory

MartinOtter · March 19, 2020, 9:16pm

I have the problem that during simulation a linear equation system A*x=b is solved many times. The desired solution is to allocate memory for all arrays involved in the operation once before simulation starts and during simulation only update the array elements and then compute a new solution. I did not yet find a way to implement this in Julia. Example:

module Test_lu

using LinearAlgebra

nx = 10
A = rand(nx,nx)
b = rand(nx)

for i=1:5 # Solve liner equation system 5-times
    for j=1:nx
        for k=1:nx
            A[j,k] = A[j,k] + i  # New values for A
        end
        b[j] = b[j] + i   # New values for b
    end
    @time w = lu!(A)
    @time ldiv!(w,b)
end

end

# Gives output:
  0.000010 seconds (2 allocations: 192 bytes)
  0.000006 seconds
  0.000003 seconds (2 allocations: 192 bytes)
  0.000001 seconds
  0.000003 seconds (2 allocations: 192 bytes)
  0.000002 seconds
  0.000003 seconds (2 allocations: 192 bytes)
  0.000004 seconds
  0.000006 seconds (2 allocations: 192 bytes)
  0.000004 seconds

Its clear that lu! always allocates memory because it returns a new vector (here: w), probably the pivot vector. Is there a way to avoid this (so provide a pre-allocated w vector)?
Or is there another way to solve a linear equation system many times (with changed elements of A,b), without allocating memory in every iteration?

Probably, I could directly use LAPACK functions and then it should be possible to formulate it, but then I loose the possibility to change the type of the matrices (e.g. use Measurements.jl).

jlchan · March 19, 2020, 9:24pm

2 allocations doesn’t seem like it would contribute significantly to runtime. Are there specific reasons you wish to avoid all allocations?

MartinOtter · March 19, 2020, 9:40pm

Julia is very nice to use, but the drawback is that it allocates memory at many places. Some time ago, I systematically went through some part of my code called during simulation to get rid of unnecessary memory allocations in order to have a fair comparision with available (non-Julia) tools in my area. At the end, the speed-up (just by removing unnecessary memory allocations) was about 10. I do not ask currently, whether one particular place really counts for simulation speed-up or not, but just want to get rid of memory allocations whenever possible. Note, all this would be uncritical, if Julia would have some O(1) caching mechanism, so that previously freed memory can be very efficiently re-allocated again. But this seems to be not the case.

Oscar_Smith · March 19, 2020, 9:49pm

Do you need to compute the lu factorization? If not, you could just use ldiv!(X,b)

MartinOtter · March 19, 2020, 10:02pm

I do not need the lu factorization, just the solution of the equation system. I just tried your proposal but got errors (I am using Julia 1.3.1):

nx = 10
A = rand(nx,nx)
b = rand(nx)
ldiv!(A,b) 
# ERROR: MethodError: no method matching ldiv!(::Array{Float64,2}, ::Array{Float64,1})

Is this a feature of a newer Julia version?

PetrKryslUCSD · March 19, 2020, 10:18pm

It is ldiv!(Y, A, B) I believe.

DNF · March 19, 2020, 11:02pm

You should definitely use BenchmarkTools for this. You will never see zero allocations using the @time macro.

The timing estimates are probably also wildly inaccurate. Microbenchmarks are not a good use case for @time. Use BenchmarkTools, and remember to interpolate.

MartinOtter · March 20, 2020, 6:23am

I tried:

nx = 10
A = rand(nx,nx)
b = rand(nx)
x = zeros(nx)
@time ldiv!(x,A,b)
#ERROR: MethodError: no method matching ldiv!(::Array{Float64,1}, ::Array{Float64,2}, ::Array{Float64,1})

lu! returns a struct that contains a reference to the overwritten A and a (new) pivot vector. So, it is clear that lu! always allocates memory.

Tamas_Papp · March 20, 2020, 6:50am

Not if you disable pivoting. You may be looking for

Alu = lu!(A, Val(false))
ldiv!(Alu, b)

Note, however, that you may worried about something that is unlikely to be relevant: if A is large, then the computation itself will dominate timing, if A is small, you should using StaticArrays which again do not allocate.

Also, please do read the docstrings (?lu!, ?ldiv!) instead of programming by trial and error. Most of these things are documented.

MartinOtter · March 20, 2020, 7:48am

MartinOtter:

MartinOtter:

So, it is clear that lu! always allocates memory.

Not if you disable pivoting. You may be looking for
Alu = lu!(A, Val(false))
ldiv!(Alu, b)
Note, however, that you may worried about something that is unlikely to be relevant: if A is large, then the computation itself will dominate timing, if A is small, you should using StaticArrays which again do not allocate.

Also, please do read the docstrings ( ?lu! , ?ldiv! ) instead of programming by trial and error. Most of these things are documented.

Disabling pivoting is no alternativ, because a solution of a regular system may fail.

I am contributing to a generic modeling and simulation system (Modia/Modia3D/ModiaMath) and depending on user models a few or many, small or large linear equation systems will appear.

My summary of the remarks is: In Julia there is currently no generic library function available to solve repeately a (small or large) linear equation system without allocating memory in every iteration. For the time being I will just use lu! and ldiv! and at some time in the future provide a special library function that either uses LAPACK functions directly (for Float64) and otherwise uses lu! and ldiv!.

DNF · March 20, 2020, 8:04am

This doesn’t change the fact that you should always use BenchmarkTools for microbenchmarks:

julia> @time ldiv!(w, b);
  0.000021 seconds (4 allocations: 160 bytes)

julia> @btime ldiv!($w, $b);
  1.530 μs (0 allocations: 0 bytes)

julia> 0.000021/1.53e-6
13.72549019607843

Or perhaps better, not sure what ldiv! does (edited this a couple of times):

for i=1:5 # Solve liner equation system 5-times
    for j=1:nx
        for k=1:nx
            A[j,k] = A[j,k] + i  # New values for A
        end
        b[j] = b[j] + i   # New values for b
    end
    @btime lu!(A_) setup=(A_=copy(A))
    @btime ldiv!(w, b_) setup=(w=lu!(copy(A)); b_=copy(b))
end

1.089 μs (2 allocations: 192 bytes)
356.595 ns (0 allocations: 0 bytes)
1.116 μs (2 allocations: 192 bytes)
356.665 ns (0 allocations: 0 bytes)
1.112 μs (2 allocations: 192 bytes)
356.832 ns (0 allocations: 0 bytes)
1.107 μs (2 allocations: 192 bytes)
356.900 ns (0 allocations: 0 bytes)
1.109 μs (2 allocations: 192 bytes)
356.928 ns (0 allocations: 0 bytes)

You have to do be a bit careful with mutating functions (not sure if I did this correctly, but to me it appears to be a 10-60x difference in measurement result.)

Tamas_Papp · March 20, 2020, 9:26am

I think it would make sense to have lu!(Alu, A) with pivoting. Consider making a PR to LinearAlgebra.

When using Julia, keep in mind that library functions do not cover every conceivable use case. Some are omitted by design, but some are waiting to be implemented. We are talking about a few lines of code anyway.

PetrKryslUCSD · March 20, 2020, 3:21pm

a = rand(6, 6); b = rand(6); x = rand(6)
a = lu!(a) 
ldiv!(x, a, b)

stevengj · March 20, 2020, 4:58pm

IIRC, the issue is that LAPACK’s factorization routines typically require some additional workspace vectors to be supplied, in addition to storage for the matrix itself, so these have to be allocated in lu!. To be completely allocation-free, the caller would need to pass an additional workspace parameter.

(Of course, you could always call the LAPACK routines directly from your Julia code and do whatever low-level hackery you want.)

Tamas_Papp · March 21, 2020, 4:31pm

I think that should be feasible. Alternatively, something like

Alu = lu(A)
lu!(Alu, A) # use previously allocated results if types are compatible

could also work and could provide a nice general interface for similar methods.

Tamas_Papp · March 21, 2020, 4:32pm

That said, I still think that worrying about allocations is not something one should consider unless they are willing to pay quite a bit in code complexity for very marginal improvements. Here is a function for those who want to experiment:

using LinearAlgebra, BenchmarkTools

function f(n, m)
    A = rand(n, n)
    b = rand(n)
    z = 0.0
    for _ in 1:m
        z += (A \ b)[1]
        A .+= 1
        b .+= 1
    end
    z
end

@benchmark f(20, 20) # try n = 10, 20, 30, 50

where the outer loop was included to make profiling easier. Around 1–2% of the time is spent on GC even for small matrices. LAPACK time dominates everything. This becomes much more prominent for larger matrices.

MartinOtter · April 7, 2020, 10:48am

Tamas_Papp:

stevengj:

To be completely allocation-free, the caller would need to pass an additional workspace parameter.

I think that should be feasible. Alternatively, something like
Alu = lu(A)
lu!(Alu, A) # use previously allocated results if types are compatible
could also work and could provide a nice general interface for similar methods.

Yes, this is a nice general interface (especially, since the caller does not need to know the type of the internally allocated memory).

MartinOtter · April 7, 2020, 11:08am

When you have many calls, as in simulation or optimization, permanently allocating memory (even small amounts), might be not acceptable:

Offline simulation: Assume every call allocates 1kbyte and this is done every milli-second, your main memory has 16Gbyte → Simulating more as 4.4 hours (16e6*0.001/60/60) will fill the complete main memory. Of course, garbage collection will “somehow” take care of it, but this will take time and this is hard to measure.
Reatime simulation: If hardware is involved, especially for a safety critical operation, it is completely forbidden to allocate memory during operation. Hard realtime simulation might not be possible currently with Julia, but I guess this can be achieved at some point in the future.

Tamas_Papp · April 7, 2020, 11:28am

Garbage collection will take care of it by (surprise) collecting unused memory. Yes, it takes time, but modern GC can be very efficient. Sometimes more efficient than manual memory management.

I am not saying that one should never worry about allocations — they can be a key optimization in hot loops. That said, writing code with nontrivial logic and control flow with absolutely 0 allocations is rarely worth it in Julia, especially if one wants to take advantage of facilities like AD. The effort expended on the last 0.1% of time spent on GC can be usually allocated () much better.

Hard realtime is something you design a language for from the very beginning. Allocations are only part of the story. It is a very constraining requirement for which specialized tools exist (and consequently it can be very expensive). I don’t think it is planned for Julia in any sense.