Edits for improved clarify in italics
I realised last week just how slow the log() function really is, as an example, comparing log() to fft(), it turns out log() is about 2 times slower, even though it is working on individual elements, see below:
data = randn(Float32, 100, 10000) + 1im*randn(Float32, 100, 10000)
amplitude = abs.(data)
d = log.(amplitude);
f = fft(data,1);
using BenchmarkTools
julia> @benchmark fft!(data,1)
BenchmarkTools.Trial:
memory estimate: 3.25 KiB
allocs estimate: 51
--------------
minimum time: 3.204 ms (0.00% GC)
median time: 3.327 ms (0.00% GC)
mean time: 3.495 ms (0.00% GC)
maximum time: 6.788 ms (0.00% GC)
--------------
samples: 1427
evals/sample: 1
julia> @benchmark d .= log.(amplitude)
BenchmarkTools.Trial:
memory estimate: 96 bytes
allocs estimate: 3
--------------
minimum time: 8.664 ms (0.00% GC)
median time: 8.885 ms (0.00% GC)
mean time: 9.175 ms (0.00% GC)
maximum time: 15.413 ms (0.00% GC)
--------------
samples: 544
evals/sample: 1
After some digging I concluded that the problem is that julia is not using simd to compute the log().
Some web searches led me to: http://software-lisc.fbk.eu/avx_mathfun/
where I found "an AVX translation of the SSE2 implementation " (I’m sure there are many other implementations), with some hacking of that code, I managed to get the log(Array) calculation part going doing 8 elements in parallel at a time (by removing the sin and cos functions) to get about a 7 times speed-up, however I ran into 2 issues:
- to prevent unnecessary memory allocation, I rather created a log!(outputVector, inputVector)
- this only works if I can call it as vector, the dot-fusion operator actually gets in the way here.
Is there a way to address point 2 by overloading a special case of the log. function?
It feels doubtful or am I missing something?
I considered packing this as a normal julia package, however it is platform dependant and would force a non-Julia interface.
At the same time, I’m also wondering if the same problem occurs for sin and cos and whether spending further time on that code to get those working as well, although I’m using them less often.