For some applications one might gain performance by using 32-bit arithmetic on a 64-bit system because this could improve memory-bandwidth as well as (possible) SIMD operations.
I noticed however that some functions seem to actually suffer from using Float32, which seems a bit surprising:
julia> using LinearAlgebra, BenchmarkTools
julia> const A_64 = rand(100,100);
julia> const A_32 = Float32.(A_64);
julia> @btime lu($A_64);
1.241 ms (4 allocations: 79.11 KiB)
julia> @btime lu($A_32);
3.248 ms (4 allocations: 40.05 KiB)
julia> versioninfo()
Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-8.0.1 (ORCJIT, haswell)
For lu! I changed the code a little bit (of course there is now some overhead of rand and non-identical matrices, but I think their impact should be neglectable).
julia> @btime lu!(rand(100,100));
1.546 ms (4 allocations: 79.11 KiB)
julia> @btime lu!(rand(Float32,100,100));
2.847 ms (4 allocations: 40.05 KiB)