Radon and iradon transforms

Hi All,

I am new to Julia, coming from Matlab, recently started working on OPT (Optical Projection Tomography) project where one of requirements is open source.

I tried to implement something similar to Matlab radon/iradon pair, using as a base the C++ code developed previously by T. Knopp (https://github.com/tknopp/numcpp).
For debugging, i mirrored Julia implementation of fbp in Matlab.

Quality-wise, sinograms and reconstructions of phantom are in good agreement with each other and with original.
(there is one minor scale issue on iradon, as I didn’t find suitable “imresize”, but that can be fixed later)

The main drawback is that the “naïve” Julia functions are much slower than optimized Matlab’s.

For 1000x1000 phantom image, the ratios of processing times were
radon - 16.7
iradon - 20.5
iradon translated to Matlab from Julia - 23.5

That is not practical for the project.
The main issue looks like simply slow brackets.

I paste the code and test program below, hope to hear any suggestion on speed-up.
(It’s amateurish but sort of does the job).
I think sooner or later the question of radon/iradon will appear in Julia anyway.

# tomography.jl

########################################################
function radon(A,angles)

    angles = angles + pi/2;

    w,h = size(A);
    
    Nr_div2 = floor(sqrt(w*w+h*h)/2);

    Nr = Int64(Nr_div2*2+1);

    L = length(angles);

    sinogram = zeros(Nr,L);

    SIN = sin(angles);
    COS = cos(angles);

    R = linspace(-0.5,0.5,Nr)*Nr;

    RSIN = zeros(Nr,L);
    RCOS = zeros(Nr,L);

    w2 = w/2;
    h2 = h/2;

      for a=1:L
        for k=1:Nr
          RSIN[k,a]=R[k]*SIN[a];
          RCOS[k,a]=R[k]*COS[a];
        end
      end

    for a=1:L
        for k=1:Nr
          for l=1:Nr

              x = RCOS[k,a] - RSIN[l,a] + w2;
              y = RSIN[k,a] + RCOS[l,a] + h2;

              x1 = floor(x);
              x2 = ceil(x);
              y1 = floor(y);
              y2 = ceil(y);

              if !(x1 <= 0 || x2 > w || y1 <= 0 || y2 > h)

                # BILINEAR INTERPOLATION - STARTS
                # x1 y1 - btm left
                # x1 y2 - top left
                # x2 y1 - btm rght
                # x2 y2 - top rght
                f11 = A[Int64(x1),Int64(y1)];
                f21 = A[Int64(x1),Int64(y2)]; # that looks like, correct way
                f12 = A[Int64(x2),Int64(y1)];
                f22 = A[Int64(x2),Int64(y2)];
                  value = 0;
                if x2==x1 && y2==y1
                  value = f11;
                elseif x2==x1
                  value = 1/(y2-y1)*( f11*(y2-y) + f22*(y-y1) );
                elseif y2==y1
                  value = 1/(x2-x1)*( f11*(x2-x) + f22*(x-x1) );
                else
                  value = 1/(x2-x1)/(y2-y1)*( f11*(x2-x)*(y2-y) +
                                              f21*(x-x1)*(y2-y) +
                                              f12*(x2-x)*(y-y1) +
                                              f22*(x-x1)*(y-y1) );
                end
                # BILINEAR INTERPOLATION - ENDS

                sinogram[k,a] += value;

              end
          end
        end
    end
    return sinogram;
end

#########################################
function iradon(A,angles) # input angles [rad]

  angles = angles + pi/2;

  N, nAngles = size(A);

  I = zeros(N,N); # reconstruction

  x = linspace(-0.5,0.5,N);

  filter = abs(linspace(-1, 1, N));

  # FT domain filtering
  for t=1:length(angles)
      fhat = fftshift(fft(slice(A,:,t)));
      A[:,t] = real(ifft(ifftshift(fhat.*filter)));
  end

  XCOS = zeros(N,length(angles));
  XSIN = zeros(N,length(angles));
  for k=1:N
    for a=1:length(angles)
      XCOS[k,a]=x[k]*cos(angles[a]);
      XSIN[k,a]=x[k]*sin(angles[a]);
    end
  end

  for m=1:N
    for n=1:N
      for t=1:nAngles
        r = XCOS[m,t] + XSIN[n,t];
        index = Int64(round((r/sqrt(2.0) + 0.5)*N)); # sqrt(2) magnified
        # index = Int64(round((r + 0.5)*N)); # to keep pixel resolution
          if index>0 && index<=N
            I[m,n] += A[index,t];
          end
      end
    end
  end

  return I;
end

test program:

include("tomography.jl")

using Images, TestImages, ImageView

angles = (0:1:359)/360*2*pi; # [rad]

z = shepp_logan(1000; highContrast=true);
ImageView.imshow(z)

z = float(z);

tic();
sinogram = radon(z,angles);
display(toc());

v1 = minimum(sinogram);
v2 = maximum(sinogram);
todisplay = (sinogram-v1)/(v2-v1);
ImageView.imshow(todisplay);

tic();
reconstruction = iradon(sinogram,angles);
display(toc());

v1 = minimum(reconstruction);
v2 = maximum(reconstruction);
todisplay = (reconstruction-v1)/(v2-v1);
ImageView.imshow(todisplay);

From a cursory glance you should at least change value = 0 to value = 0.0 to avoid type instability. Or actually, you can just remove that line.

yepp that gives about 9% performance gain, thanks!

This looks nice :+1: it would be really good if this were published in a package!

I’ve come here with similar performance questions before. The two main tools I learned about were

  • The ProfileView package is very useful to understand bottlenecks

  • The @code_warntype macro is very good for identifying type instabilities

Type instabilities were, for me, the least obvious source of performance loss on Julia. Eliminating all of them (checking the output of @code_warntype radon(z,angles) for anything annotated ::Any or ::Union{...}, and changing the typing until everything is annotated with a single concrete type) will probably give big performance gains. Once you’ve done that, you’ve gotten most of what you can get out of Julia’s compiler, and the rest will be down to improving the logic of your code, or the performance of functions you didn’t write. I guess you expect your logic to be good, so I would investigate type instability further. @GunnarFarneback’s example shows how something tiny can have good impact.

P.S. you don’t need the semicolons :slight_smile:

2 Likes

The funny thing is that Julia was the reason I abandoned the numcpp project. At that time Jeff helped me solving a type stability and after that Julia was equally fast as my hand tuned C++ code.

I could not find something to bad in the iradon function. writing round(Int64, …) gave me a little speedup. Also it might be that the caches are not perfectly utilized.

Seems like you’re at a place, where only multi-threading will improve things significantly.
I did a quite naive take at it (will need more careful handling of the slicing) and got a almost 4x (for 4 cores) speedup and the results seem to agree!

function radon_inner(sinogram, A, RCOS, RSIN, w, h, w2, h2, L, Nr, nt, tl)
    for a = 1:L, tk = 1:nt:(Nr - nt)
        @simd for l = 1:Nr
            k = tk + tl
            value = zero(eltype(A))
            @inbounds begin
                x = RCOS[k, a] - RSIN[l, a] + w2
                y = RSIN[k, a] + RCOS[l, a] + h2
                x1 = floor(Int, x)
                x2 = ceil(Int, x)
                y1 = floor(Int, y)
                y2 = ceil(Int, y)
                if !(x1 <= 0 || x2 > w || y1 <= 0 || y2 > h)
                    f11 = A[x1, y1]
                    f21 = A[x1, y2] # that looks like, correct way
                    f12 = A[x2, y1]
                    f22 = A[x2, y2]
                    if x2 == x1 && y2 == y1
                        value = f11
                    elseif x2 == x1
                        value = 1.0 / (y2-y1)*( f11*(y2-y) + f22*(y-y1) )
                    elseif y2 == y1
                        value = 1.0 / (x2-x1)*( f11*(x2-x) + f22*(x-x1) )
                    else
                        value = 1.0 /
                            (x2-x1)/(y2-y1) * (
                                f11*(x2-x)*(y2-y) +
                                f21*(x-x1)*(y2-y) +
                                f12*(x2-x)*(y-y1) +
                                f22*(x-x1)*(y-y1)
                            )
                    end
                end
            end
            sinogram[k, a] += value
        end
    end
end

function radon(A, angles)
    angles = angles .+ pi/2
    w, h = size(A)
    Nr_div2 = floor(Int, sqrt(w * w + h * h) / 2)
    Nr = Nr_div2 * 2 + 1
    L = length(angles)
    sinogram = zeros(Nr, L)

    SIN = sin.(angles)
    COS = cos.(angles)
    R = linspace(-0.5, 0.5, Nr) * Nr

    RSIN = zeros(Nr, L)
    RCOS = zeros(Nr, L)

    w2 = w / 2
    h2 = h / 2
    @inbounds for a = 1:L
        for k = 1:Nr
            RSIN[k, a] = R[k] * SIN[a]
            RCOS[k, a] = R[k] * COS[a]
        end
    end
    nt = Base.Threads.nthreads()
    tmp = zeros(nt)
    Base.Threads.@threads for tl = 0:(nt - 1)
        radon_inner(sinogram, A, RCOS, RSIN, w, h, w2, h2, L, Nr, nt, tl)
    end
    return sinogram
end

run with JULIA_NUM_THREADS=4 julia5 -O3 --math-mode=fast.
I think the -O3 --math-mode=fast didn’t actually help :wink:

I didn’t try iradon, but I see that you got some fft’s in there, so it might help to execute: Base.FFTW.set_num_threads(8)

1 Like

Looking at the Radon transform implementation, it seems that maybe the bilinear interpolation can be improved upon a little bit. If I understand correctly you are interpolating the 2D function as piecewise linear using matrix A as a grid (btw, the interpolation formula is of course f(x) = (f(x1)*(x2-x)+f(x2)*(x-x1))/(x2-x1) but here x2-x1 = 1 so you don’t need to divide). This 2D function restricted to a line is piecewise linear, so ideally you should get an increase in accuracy if you compute the exact integral (which basically boils down to choosing the intersection of your line with the grid as points to evaluate to compute the integral). I’m not sure how performance would change, but at least you would only need to do 1D and not 2D linear interpolation.

thanks all for your suggestions, I ll try my best to incorporate them.

huge kudos sdanisch for your multithreaded design.
I think if radon/iradon ever goes out as package, optimizations like that will be crucial - but I afraid I don’t have enough qualification to implement it properly.
(well, unless somebody invents faster brackets… :slight_smile:).

As to semicolons - they remind about my sorrows.
Promise to drop them when Julia/Matlab processing time ratio is at least 2/1 here :slight_smile:

Many thanks Tobias for your tomography code and for introducing me to Julia.

Y.
/Yuriy Alexandrov
…

2 Likes

yes this division isn’t needed.
thanks!
my intention was to make it looking instructive and safe first, to easier identify errors.
I think still there are lot of stupidities in the code, but as it already produces outputs very similar to expected, the performance issue is quite noticeable anyway.
…

It doesn’t have to be an officially registered “package” – but putting it up on your github, it’ll be accessible to anyone who faces the same problems as you later, and any improvements from others can be added more easily :slight_smile:

What’s this about fast and slow brackets? (I don’t know what bottleneck that could be)

Nice to see some interest in adding this! Yes, it’s possible to do quite a lot better, although how much of this is Julia and how much is just “cleanup” is debatable. (I do think that Julia’s “mindset” helps with this “cleanup.”)

  • First, you don’t need any of those temporary arrays—this is the “vectorize everything” mindset of Matlab
  • Once you notice that the values in the computation aren’t arbitrary entries extracted from some matrix, but instead change in predictable ways, you notice opportunities for speedup
    • You realize that you can pre-calculate the range of l that will be in-bounds
    • That allows you to get rid of all the branches in the inner loop

Here’s my code for radon (I didn’t tackle iradon):

function radon(A::AbstractMatrix, angles)
    w, h = size(A)
    d = floor(Int, sqrt(w*w+h*h)/2)  # half-length of the diagonal
    Nr = 2d+1
    R = linspace(-Nr/2, Nr/2, Nr)

    sinogram = zeros(Nr, length(angles))

    for j = 1:length(angles)
        a = angles[j]
        θ = a + pi/2
        sinθ, cosθ = sin(θ), cos(θ)
        for k = 1:length(R)
            rk = R[k]
            x0, y0 = rk*cosθ + w/2, rk*sinθ + h/2
            # for a line perpendicular to this displacement from the center,
            # determine the length in either direction within the boundary of the image
            lfirst, llast = interior(x0, y0, sinθ, cosθ, w, h, R)
            tmp = 0.0
            @inbounds for l in lfirst:llast
                rl = R[l]
                x, y = x0 - rl*sinθ, y0 + rl*cosθ
                ix, iy = trunc(Int, x), trunc(Int, y)
                fx, fy = x-ix, y-iy
                tmp += (1-fx)*((1-fy)*A[ix,iy] + fy*A[ix,iy+1]) +
                       fx*((1-fy)*A[ix+1,iy] + fy*A[ix+1,iy+1])
            end
            sinogram[k,j] = tmp
        end
    end
    return sinogram;
end

function interior(x0, y0, sinθ, cosθ, w, h, R::Range)
    rx1, rxw = (x0-1)/sinθ, (x0-w)/sinθ
    ry1, ryh = (1-y0)/cosθ, (h-y0)/cosθ
    rxlo, rxhi = minmax(rx1, rxw)
    rylo, ryhi = minmax(ry1, ryh)
    rfirst = max(minimum(R), ceil(Int,  max(rxlo, rylo)))
    rlast  = min(maximum(R), floor(Int, min(rxhi, ryhi)))
    Rstart, Rstep = first(R), step(R)
    lfirst, llast = ceil(Int, (rfirst-Rstart)/Rstep), floor(Int, (rlast-Rstart)/Rstep)
    if lfirst == 0
        lfirst = length(R)+1
    end
    # Because of roundoff error, we still can't trust that the forward
    # calculation will be correct. Make adjustments as needed.
    while lfirst <= llast
        rl = R[lfirst]
        x, y = x0 - rl*sinθ, y0 + rl*cosθ
        if (1 <= x < w) && (1 <= y < h)
            break
        end
        lfirst += 1
    end
    while lfirst <= llast
        rl = R[llast]
        x, y = x0 - rl*sinθ, y0 + rl*cosθ
        if (1 <= x < w) && (1 <= y < h)
            break
        end
        llast -= 1
    end
    lfirst, llast
end

That gives

julia> @time radon0(z, angles);
 10.549118 seconds (14 allocations: 11.665 MB)

julia> @time radon(z, angles);
  6.498923 seconds (6 allocations: 3.887 MB)

where radon0 is your original code (with the type-instability fixed).

As Simon noted, I can make it faster with threading. It should merely be a matter of starting julia with JULIA_NUM_THREADS=2 (or how ever many physical cores you have or want to use) and adding Threads.@threads in front of the j loop, except for the infamous performance of captured variables in closures · Issue #15276 · JuliaLang/julia · GitHub. You therefore have to use a “function barrier”:

function radon(A::AbstractMatrix, angles)
    w, h = size(A)
    w2, h2 = w/2, h/2
    d = floor(Int, sqrt(w*w+h*h)/2)  # half-length of the diagonal
    Nr = 2d+1
    R = linspace(-Nr/2, Nr/2, Nr)

    sinogram = zeros(Nr, length(angles))

    radon_threads!(sinogram, A, angles, R, w, h)
end

function radon_threads!(sinogram, A, angles, R, w, h)
    Threads.@threads for j = 1:length(angles)
        # the rest is the same

With that, plus -O3 as a command-line option, I get

julia> @time radon(z, angles);
  3.349954 seconds (8 allocations: 3.887 MB)

which is approximately three-fold faster than your original. If you have more cores, of course it’s even better.

Finally, I think you have a bug in your interpolation formula: I think f21 and f12 are swapped. Also, do you really mean R = linspace(-Nr/2, Nr/2, Nr) rather than R = linspace(-(Nr-1)/2, (Nr-1)/2, Nr)? If I just use R = -d:d in my version, it’s even faster.

Last but not least,

There might be if you do Pkg.update(). Good timing on your part: the tag was merged yesterday :slight_smile:.

5 Likes

Also, with regards to your “faster brackets”: you didn’t add @inbounds, that alone should make a noticeable difference.

by “slow brackets” I simply meant access […], used a lot in double or triple loops like here with 3 indices

 r = XCOS[m,t] + XSIN[n,t];

They aren’t slow in Julia. If they seem slow, it’s because of bounds-checking. Just add @inbounds around your code.

1 Like

Also, if you don’t want to add @threads (as you see, it’s not hard…), then make sure you start matlab with -singleCompThread.

Excellent!
For me it’s now: original ~10, my naive version ~2.5, @tim.holy super speed: ~1.0 :slight_smile:
Nice numbers!

2 Likes

tried your version of radon (without multithreading) - 3x faster!
albeit the multi-threaded version by sdanisch turned out to be slower in my hands, by some reason.
as I said, I’m only a modest copy-paster. :slight_smile:
also it looks like my comp have only 2 cores:
Intel Core i7-3687U @ 2.10GHz No of Cores: 2 (2 logical cores per physical)

likely you are right on bugs, too. reconstruction seems more smooth after indices swapped.
…

1 Like

Another advantage of putting this code in a package is that it clarifies the license (assuming you have a LICENSE file). As is, I can’t use any of the code in this thread.

1 Like

isn’t it too radical an expression. :slight_smile:
if license file solved all issues, then why would people pay for things like private github options.

as i mentioned, the original idea was authored by tknopp, I translated his code to Julia (adding some semicolons and bugs), sdanisch suggested multi-threading, piever found some excessive computatio, and then fastest - and very likely bug-free - radon version was published by tim.holy today.
this is brief history of achievements.
so I don’t think I should be a keeper; also I’m not familiar with Julia enough to handle it.

on the other hand, there is no doubt radon/iradon will be eventually coded in Julia, and hopefully will be reasonably fast.
so maybe it’s better to leave it at the discretion of Julia developers’, or the most skilled in the field?
just guessing.
…

I’ll probably add a version of this to ImageTransforms.jl

4 Likes