Dear all,

in my attempts to play with CUDA in Julia, I’ve come accross something I can’t really understand -hopefully because I’m doing something wrong. The fact is that in my calculations I need to perform Fourier transforms, which I do wiht the fft() function. But sadly I find that the result of performing the fft() on the CPU, and on the same array transferred to the GPU, is different. My code is simply

```
using CuArrays
using CUDAnative
using CUDAdrv
using FFTW
N = 64;
A = rand(Float32,N,N,N);
B = fft(A);
Ad = cu(A);
Bd = fft(Ad);
BB = Array(Bd);
maximum(abs.(B-BB))
> 0.015625f0
```

…and the difference worsens with increasing N. For instance with N=256 I get a difference (in a single run) of 0.2275149f0. Of course the value of the FT also grows, so at this point I see small differences that are neither negligible nor impossible to live with.

So is this an expected behaviour? Or am I doing something weird?

Best regards,

Ferran.