I am working on a new package called SeparableFunctions.jl
with the aim to boost the performance for functions which are expensive to calculate on multidimensional grids, but can profit a lot from broadcasting.
To this aim, it turns out that the fastest implementation on CPU or GPU alike is to utilize Julia’s optimized broadcasting mechanisms. For this reason SeparableFunctions.jl
returns a tuple of oriented Vectors each oriented along a particular axis. An example looks like this:
julia> exp_sep = exp_ikx_sep((5,6), (3.3,2.2))
(ComplexF32[-0.4257795f0 + 0.90482694f0im; -0.5358267f0 - 0.844328f0im; … ; -0.5358267f0 + 0.844328f0im; -0.4257795f0 - 0.90482694f0im;;], ComplexF32[0.8090169f0 + 0.5877854f0im -0.10452834f0 - 0.9945219f0im … -0.6691307f0 - 0.7431448f0im -0.10452834f0 + 0.9945219f0im])
julia> res = .*(exp_sep...)
5×6 Matrix{ComplexF32}:
-0.876307+0.481753im 0.944376+0.328867im … 0.95732-0.289032im -0.855364-0.518027im
0.0627908-0.998027im -0.783694+0.621148im -0.26892+0.963163im 0.895712-0.444635im
0.809017+0.587785im -0.104528-0.994522im -0.669131-0.743145im -0.104528+0.994522im
-0.929777+0.368124im 0.895712+0.444635im 0.985996-0.166769im -0.783694-0.621148im
0.187381-0.982287im -0.855364+0.518027im -0.387515+0.921863im 0.944376-0.328867im
which works very well, but is admittedly ugly to use for newcomers. It would be nice, if there was a simple way to delay this expression into a lazy Array without any performance loss. Something that returns exp_sep
as a macro wich self-expands into the .*(exp_sep ...)
expression upon use. Yet any attempts using LazyArrays.jl
caused massive losses in performance. Is there any other ways to hide the .*(arr ...)
expression from the user and let this broadcasting construct be used like an ordinary array e.g. res = collect(exp_sep)
? Maybe the broadcasting mechanism itself has a half-way processed broadcasted expression type, which could be used for packaging such a lazy broadcast evaluation?