Lazy evaluation on a large scale, is it the right thing to do?

First a quick description of my usecase:

I have an array of 81x81x81 complex values which represent a wavefunction in real space. I have 200 values of vector k, for which I need to calculate various things using these 81x81x81 arrays, after calculating the discrete fourier transforms of each of them. This is done for 1 array, by taking it, shifting it a certain amount of indices (basically shifting it by -2:2 times the primary vector in all 3 dimensions), and taking the sum of these shifted values * exp(i dot(k,R)). In the end I arrive at an 81x81x81 array, where the elements are some variation of:
c1 exp(i dot(k,-2R))+c2 exp(i dot(k,-R) + c3 +c4 exp(i dot(k,R)) + c5 exp(i dot(k,2R)) +…, where the c’s are the values coming from the shifted arrays, and the R representing the shifting vector.

Obviously this is quite a lot of shuffling around and looking up different parts of memory and that is the main bottleneck of my program at the moment.

However, the only varying thing is this k-vector, meaning that all the c’s and R’s are the same for every k-point. So i feel that instead of taking my basis 81x81x81 arrays, then creating the new ones for every k-point, there might be a better strategy.

Is this a possible usecase for lazy evaluation? Basically that the sum for each point of the array is not done until I use it specifying some value of k? Is using lazy evaluation on such a scale going to be performant and manageable memory-wise?
I tried by using metaprogramming, creating an 81x81x81 array of code that represents the sums, but this results in arrays of over 6gb of data which is not manageable since I have more than 8 of those arrays to work with, and also causes a significant overhead in the evaluation when I loop through the points during various calculations.

As an idea of the speed it takes to construct the summed arrays, it takes about 0.7s for each of the 8 arrays, and this has to be done at each of my 200 k values. So it’s not painstakingly slow by any means, I just feel like there might be a better solution and like to learn more about cool features of the language, so if there are better ways of doing this using Julia please let me know!

I feel that your use case could perhaps be optimized on the algorithmic level (i.e. which temporaries to compute etc), even before considering any optimizations on the Julia level. But it’s hard to say because there are not enough details. Could you post the precise mathematical expression that you’re evaluating? For example, shifting the array should never be necessary, because the Fourier transform of the shifted array can be obtained by a pointwise multiplication in k-space.

Thanks for the reply!
Ok some more details:
The wavefunctions I’m talking about are so-called wannier-functions, which are centered around one unit cell of a crystal, and defined in my case over a grid that spans 3 unit cells in each direction in real space. The way that they are defined, together with the hopping hamiltonian, allows me to transform back to momentum space by doing only a discrete fourier transform. This is shown in Eq.(4) of https://arxiv.org/pdf/1112.5411.pdf, where |Rn> would represent the “shifted arrays” in my case. However I’m not really shifting, I’m just calculating the indices of points that would still overlap with the original array, centered around the first unit cell, and then do the sum as in Eq.(4).

If at all useful, the code i’m talking about are functions “constructBlochSum” and “findStart” in https://github.com/louisponet/DFTools.jl/blob/master/src/DFSupCalc.jl. I defined the points of the wavefunction in such a way that they contain also information about the coordinates in real space (I know that’s a bit of a redundant thing to do since each one is defined on the same grid, but this all started as an experiment with julia and I wanted to use the type system a bit more).

Thanks for the help!