# 3d medical app stencil

Hello I am trying to apply fantactic ParallelStencil.jl (GitHub - omlins/ParallelStencil.jl: Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs) library to 3d medical imaging, as it seems ideal yet I have problems with application of code to my domain.
For the beginning I would like to achieve very simple thing just calculate the mean and standard deviation of neighbouring pixels so pseudo code would look like.

``````# dimensionality increases as input volume holds just intensity , when output volume holds array with mean and standard deviation
function getMeanAndStd ( data ::Array{Int32, 3} ) :: Array{Int32,4}

out = Array{Int32,4} # preallocating output

# for example i would use Neumann style stencil and in this naive idea Cartesian index of the center of the stencil
for ((nsten, cartIndex) in data)

flattened = flaten(nsten)
out[cartIndex] = [ mean(flattened), std(flattened) ] # putting the result in proper place of output

end
return out
end

``````

Of course In case of big images i would need to batch the data as I could have a problem in fitting it in GPU memory also as seen from output here further analysis would be in 4d .

I just wanted to ask for some guide How to achieve what I had written above so hopefully I will be able to go on further on my own and later share effects of the work with scientific community.

Thanks for Help!

So the 4d object would have 3 layers in its 4th dimension. One would be the same as the 3d object, the next would be all the means, and last would be all the std. I wonder if you get much from making it 4d? Maybe make 3 of the 3d objects as a vector instead?

Thanks for feedback and idea !; yet I am as a starting point implementing ideas of semiautomatic segmentation from article which link I provide below. basic idea is that over a set of neighboring means and standard deviations will be needed to calculate an input for pdf of multivariate gaussian which mean and variance were calculated earlier - (mean, covariance matrix, its inverse and normalization constants are precalculated so basically only some simple linear algebra left)

(PDF) Symmetric Reconstruction of Functional Liver Segments and Cross-Individual Correspondence of Hepatectomy (researchgate.net)

Thanks @Jakub_Mitura for your interest in ParallelStencil.jl and for your enthusiasm about it. Regarding your specific application:

For the beginning I would like to achieve very simple thing just calculate the mean and standard deviation of neighbouring pixels

Here is a short 3D code that uses the `@parallel_indices` macro of ParallelStencil to compute the mean and the standard deviation of neighbouring pixels using, as you suggested, a 7-point â€śNeumanâ€ť stencil:

``````const USE_GPU = false  # Use GPU? If this is set false, then no GPU needs to be available
using ParallelStencil
using ParallelStencil.FiniteDifferences3D
@static if USE_GPU
@init_parallel_stencil(CUDA, Float64, 3)
else
end
using Plots

@parallel_indices (ix,iy,iz) function computeMeanStd!(Mean::Data.Array, Std::Data.Array, In::Data.Array)
# 7-point Neuman stencil
if (ix<=size(Mean,1) && iy<=size(Mean,2) && iz<=size(Mean,3))
ixi, iyi, izi = ix+1, iy+1, iz+1
Mean[ix,iy,iz] = (In[ixi-1,iyi  ,izi  ] +
In[ixi-1,iyi  ,izi  ] + In[ixi+1,iyi  ,izi  ] +
In[ixi  ,iyi-1,izi  ] + In[ixi  ,iyi+1,izi  ] +
In[ixi  ,iyi  ,izi-1] + In[ixi  ,iyi  ,izi+1])/7.0

Std[ix,iy,iz]  = ((In[ixi-1,iyi  ,izi  ] - Mean[ix,iy,iz])^2 +
(In[ixi-1,iyi  ,izi  ] - Mean[ix,iy,iz])^2 + (In[ixi+1,iyi  ,izi  ] - Mean[ix,iy,iz])^2 +
(In[ixi  ,iyi-1,izi  ] - Mean[ix,iy,iz])^2 + (In[ixi  ,iyi+1,izi  ] - Mean[ix,iy,iz])^2 +
(In[ixi  ,iyi  ,izi-1] - Mean[ix,iy,iz])^2 + (In[ixi  ,iyi  ,izi+1] - Mean[ix,iy,iz])^2)/7.0
end
return
end

@views function img_process()
# Numerics
nx, ny, nz = 64, 64, 64
# Array allocations
In    =  @rand(nx  ,ny  ,nz  )
Mean  = @zeros(nx-2,ny-2,nz-2)
Std   = @zeros(nx-2,ny-2,nz-2)
# Calculation
@parallel computeMeanStd!(Mean, Std, In)
# Visualisation
p1 = heatmap(1:nx  , 1:nz  , Array(In )[:,Int(round(ny/2)),:]'    , aspect_ratio=1, xlims=(1,nx)  , ylims=(1,nz)  , c=:viridis, title="Input data")
p2 = heatmap(1:nx-2, 1:nz-2, Array(Std)[:,Int(round((ny-2)/2)),:]', aspect_ratio=1, xlims=(1,nx-2), ylims=(1,nz-2), c=:viridis, title="Standard deviation")
display(plot(p1, p2))
return
end

@time img_process()
``````

If you change `USE_GPU = true` then the code would execute on a CUDA capable GPU. In case, later, your images do not fit in a single GPU, you could combine ImplicitGlobalGrid.jl to ParallelStencil.jl for multi-GPU processing.

2 Likes

Thank You very much! It works like a charm for calculating means and standard deviations.
I woiuld like to ask sth more - s it possible to apply any functions or macros inside @parallel_indices function?
I understand that there should not be dynamicly called functions but as far as I understand it macros are very simmilar to code generation so there should be in theory possibility to use them, If I understand it correctly.
I would like for example pass as the argument list of cartesian indicies and than use them inside to point which points I would like to use in a stencil, later use this list in unrolled for loop (as far as I understand unrolled loop should create a line by line instructions that would be nearly identical to implementation from above yet I can not make it work). Pseudo code just about mean to make it simpler below.

``````@parallel_indices (ix,iy,iz) function computeMeanStd!(Mean::Data.Array, Std::Data.Array, In::Data.Array, indicies::Data.Array)
ixi, iyi, izi = ix+1, iy+1, iz+1

\\ indicies would be n by 3 array where n is number of points we want to use in stencil
sum = 0
@unroll for ind in indicies
sum+=In[ixi +ind[1] ,iyi+ind[2] ,izi+ind[3] ]
end \\\\for
Mean[ix,iy,iz] =sum/ size(indicies )[1]
end \\\\computeMeanStd
``````

Also is it possible to use functions of gpu linear algebra if target is GPU like CUBLAS?

Thank You !

You can, a priori, use any functions, macros or constructs within a @parallel_indices function, as long as it is correctly defined and equivalent for both standard Julia and CUDA.jl (same function names and behaviour for given args etc.). If something exists only for CUDA.jl, e.g., some CUBLAS functionality, then you should be able to define some macros that call the CUBLAS functionality when initializing ParallelStencil with CUDA and that call the corresponding CPU functionality when initializing ParallelStencil with Threads.
Here you can see an example of using macros inside a `@parallel_indices` function.
Finally, note that ParallelStencil will by default create parallel indices for each cell of the biggest array passed as function argument. If your `indices` array is the biggest array, but you rather need just an index for each cell of array `In` (or `Mean`), then you can control this in the `@parallel` call of this function (see e.g. here where only a plane of a 3-D array is accessed).

1 Like

Thank you !