Hi!
I have a package that defines many functions which are the evaluation of complex expressions over a multidimensional grid. Sometimes I want the expressions to be allocating, and sometimes not, so there are a lot of tiny variations. I found that Tullio.jl is very useful to evaluate those expressions, since it provides an easy syntax and automatically handles parallelization. Here is an example:
function _free_propagation!(ψ₀,xs,ys,zs,qxs,qys,scaling,k)
@tullio direct_phases[i,j] := k * ( xs[j]^2 + ys[i]^2 ) / 2
@tullio ψ[i,j,l] := ψ₀[i,j] * cis( direct_phases[i,j] * ( 1  scaling[l] ) / zs[l] ) / scaling[l]
fft!(ψ,(1,2))
@tullio reciprocal_phases[i,j] :=  ( qxs[j]^2 + qys[i]^2 ) / 2k
@tullio ψ[i,j,l] *= cis( reciprocal_phases[i,j] * zs[l] / scaling[l] )
ifft!(ψ,(1,2))
@tullio ψ[i,j,l] *= cis(  direct_phases[i,j] * ( 1  scaling[l] ) * scaling[l] / zs[l])
end
Tullio also allows me to perform these computations on GPU, which is great! The thing is that I want to put this extra functionality on a package extension, in order for it not to burden the loading time of someone who doesn’t have a GPU. I was able to make it work, but I found a few drawbacks:

Tullio needs KernelAbstractions and CUDAKernels to perform calculations on the GPU. I could only make things work if those packages are direct dependencies of the original package.

I need to repeat all
@tullio
calls on the extension files. For example, in order to extend_free_propagation!
I need to completely rewrite it:
function StructuredLight._free_propagation!(ψ₀::CuArray,xs::CuArray,ys::CuArray,zs::CuArray,qxs::CuArray,qys::CuArray,scaling::CuArray,k)
@tullio direct_phases[i,j] := k * ( xs[j]^2 + ys[i]^2 ) / 2
@tullio ψ[i,j,l] := ψ₀[i,j] * cis( direct_phases[i,j] * ( 1  scaling[l] ) / zs[l] ) / scaling[l]
fft!(ψ,(1,2))
@tullio reciprocal_phases[i,j] :=  ( qxs[j]^2 + qys[i]^2 ) / 2k
@tullio ψ[i,j,l] *= cis( reciprocal_phases[i,j] * zs[l] / scaling[l] )
ifft!(ψ,(1,2))
@tullio ψ[i,j,l] *= cis(  direct_phases[i,j] * ( 1  scaling[l] ) * scaling[l] / zs[l])
end
The calculations are exactly the same as in the previous case, the only difference is that now the function is defined with explicit CuArray
argument types. This is a huge drawback, since, if i want to change the internals of this function, I need to perform the exact same change in two places, which makes to code harder to maintain.
I was wondering if there were any ways to circumvent these two problems. Thanks!