Hello I am trying to apply fantactic ParallelStencil.jl (GitHub - omlins/ParallelStencil.jl: Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs) library to 3d medical imaging, as it seems ideal yet I have problems with application of code to my domain.

For the beginning I would like to achieve very simple thing just calculate the mean and standard deviation of neighbouring pixels so pseudo code would look like.

```
# dimensionality increases as input volume holds just intensity , when output volume holds array with mean and standard deviation
function getMeanAndStd ( data ::Array{Int32, 3} ) :: Array{Int32,4}
out = Array{Int32,4} # preallocating output
# for example i would use Neumann style stencil and in this naive idea Cartesian index of the center of the stencil
for ((nsten, cartIndex) in data)
flattened = flaten(nsten)
out[cartIndex] = [ mean(flattened), std(flattened) ] # putting the result in proper place of output
end
return out
end
```

Of course In case of big images i would need to batch the data as I could have a problem in fitting it in GPU memory also as seen from output here further analysis would be in 4d .

I just wanted to ask for some guide How to achieve what I had written above so hopefully I will be able to go on further on my own and later share effects of the work with scientific community.

Thanks for Help!