I have the problem that during simulation a linear equation system A*x=b is solved many times. The desired solution is to allocate memory for all arrays involved in the operation once before simulation starts and during simulation only update the array elements and then compute a new solution. I did not yet find a way to implement this in Julia. Example:
module Test_lu
using LinearAlgebra
nx = 10
A = rand(nx,nx)
b = rand(nx)
for i=1:5 # Solve liner equation system 5-times
for j=1:nx
for k=1:nx
A[j,k] = A[j,k] + i # New values for A
end
b[j] = b[j] + i # New values for b
end
@time w = lu!(A)
@time ldiv!(w,b)
end
end
# Gives output:
0.000010 seconds (2 allocations: 192 bytes)
0.000006 seconds
0.000003 seconds (2 allocations: 192 bytes)
0.000001 seconds
0.000003 seconds (2 allocations: 192 bytes)
0.000002 seconds
0.000003 seconds (2 allocations: 192 bytes)
0.000004 seconds
0.000006 seconds (2 allocations: 192 bytes)
0.000004 seconds
Its clear that lu! always allocates memory because it returns a new vector (here: w), probably the pivot vector. Is there a way to avoid this (so provide a pre-allocated w vector)?
Or is there another way to solve a linear equation system many times (with changed elements of A,b), without allocating memory in every iteration?
Probably, I could directly use LAPACK functions and then it should be possible to formulate it, but then I loose the possibility to change the type of the matrices (e.g. use Measurements.jl).
Julia is very nice to use, but the drawback is that it allocates memory at many places. Some time ago, I systematically went through some part of my code called during simulation to get rid of unnecessary memory allocations in order to have a fair comparision with available (non-Julia) tools in my area. At the end, the speed-up (just by removing unnecessary memory allocations) was about 10. I do not ask currently, whether one particular place really counts for simulation speed-up or not, but just want to get rid of memory allocations whenever possible. Note, all this would be uncritical, if Julia would have some O(1) caching mechanism, so that previously freed memory can be very efficiently re-allocated again. But this seems to be not the case.
You should definitely use BenchmarkTools for this. You will never see zero allocations using the @time macro.
The timing estimates are probably also wildly inaccurate. Microbenchmarks are not a good use case for @time. Use BenchmarkTools, and remember to interpolate.
Not if you disable pivoting. You may be looking for
Alu = lu!(A, Val(false))
ldiv!(Alu, b)
Note, however, that you may worried about something that is unlikely to be relevant: if A is large, then the computation itself will dominate timing, if A is small, you should using StaticArrays which again do not allocate.
Also, please do read the docstrings (?lu!, ?ldiv!) instead of programming by trial and error. Most of these things are documented.
Disabling pivoting is no alternativ, because a solution of a regular system may fail.
I am contributing to a generic modeling and simulation system (Modia/Modia3D/ModiaMath) and depending on user models a few or many, small or large linear equation systems will appear.
My summary of the remarks is: In Julia there is currently no generic library function available to solve repeately a (small or large) linear equation system without allocating memory in every iteration. For the time being I will just use lu! and ldiv! and at some time in the future provide a special library function that either uses LAPACK functions directly (for Float64) and otherwise uses lu! and ldiv!.
Or perhaps better, not sure what ldiv! does (edited this a couple of times):
for i=1:5 # Solve liner equation system 5-times
for j=1:nx
for k=1:nx
A[j,k] = A[j,k] + i # New values for A
end
b[j] = b[j] + i # New values for b
end
@btime lu!(A_) setup=(A_=copy(A))
@btime ldiv!(w, b_) setup=(w=lu!(copy(A)); b_=copy(b))
end
1.089 ÎĽs (2 allocations: 192 bytes)
356.595 ns (0 allocations: 0 bytes)
1.116 ÎĽs (2 allocations: 192 bytes)
356.665 ns (0 allocations: 0 bytes)
1.112 ÎĽs (2 allocations: 192 bytes)
356.832 ns (0 allocations: 0 bytes)
1.107 ÎĽs (2 allocations: 192 bytes)
356.900 ns (0 allocations: 0 bytes)
1.109 ÎĽs (2 allocations: 192 bytes)
356.928 ns (0 allocations: 0 bytes)
You have to do be a bit careful with mutating functions (not sure if I did this correctly, but to me it appears to be a 10-60x difference in measurement result.)
I think it would make sense to have lu!(Alu, A) with pivoting. Consider making a PR to LinearAlgebra.
When using Julia, keep in mind that library functions do not cover every conceivable use case. Some are omitted by design, but some are waiting to be implemented. We are talking about a few lines of code anyway.
IIRC, the issue is that LAPACK’s factorization routines typically require some additional workspace vectors to be supplied, in addition to storage for the matrix itself, so these have to be allocated in lu!. To be completely allocation-free, the caller would need to pass an additional workspace parameter.
(Of course, you could always call the LAPACK routines directly from your Julia code and do whatever low-level hackery you want.)
That said, I still think that worrying about allocations is not something one should consider unless they are willing to pay quite a bit in code complexity for very marginal improvements. Here is a function for those who want to experiment:
using LinearAlgebra, BenchmarkTools
function f(n, m)
A = rand(n, n)
b = rand(n)
z = 0.0
for _ in 1:m
z += (A \ b)[1]
A .+= 1
b .+= 1
end
z
end
@benchmark f(20, 20) # try n = 10, 20, 30, 50
where the outer loop was included to make profiling easier. Around 1–2% of the time is spent on GC even for small matrices. LAPACK time dominates everything. This becomes much more prominent for larger matrices.
When you have many calls, as in simulation or optimization, permanently allocating memory (even small amounts), might be not acceptable:
Offline simulation: Assume every call allocates 1kbyte and this is done every milli-second, your main memory has 16Gbyte → Simulating more as 4.4 hours (16e6*0.001/60/60) will fill the complete main memory. Of course, garbage collection will “somehow” take care of it, but this will take time and this is hard to measure.
Reatime simulation: If hardware is involved, especially for a safety critical operation, it is completely forbidden to allocate memory during operation. Hard realtime simulation might not be possible currently with Julia, but I guess this can be achieved at some point in the future.
Garbage collection will take care of it by (surprise) collecting unused memory. Yes, it takes time, but modern GC can be very efficient. Sometimes more efficient than manual memory management.
I am not saying that one should never worry about allocations — they can be a key optimization in hot loops. That said, writing code with nontrivial logic and control flow with absolutely 0 allocations is rarely worth it in Julia, especially if one wants to take advantage of facilities like AD. The effort expended on the last 0.1% of time spent on GC can be usually allocated () much better.
Hard realtime is something you design a language for from the very beginning. Allocations are only part of the story. It is a very constraining requirement for which specialized tools exist (and consequently it can be very expensive). I don’t think it is planned for Julia in any sense.