Optimizing the calculation of many distances

Hey everyone, I’m translating some code from Julia to MATLAB and I have a seismic imaging program that now works in Julia and is a bit faster than MATLAB - but not a ton. The profiler says that the function pasted below takes up the bulk of the calculation - is there is an obvious Julia error I’m making? The three list comprehensions are the vast bulk of the time.

New to Julia so the answer may be simple. I edited this so that the code includes the constructors and some dummy constructors to make the code executable.

This code is a simple interpolation scheme for pasting values from the model (scattered points in x, y with a value v in three separate fields) onto ray paths defined by xy coodinates, ray.x and ray.y. Then the last line calculates the time of a seismic phase through a medium with the seismic velocity (u, which is 1/velocity) and two anisotropic parameters (A, the magnitude of anisotropy, t, fast direction).

I’ve tried some MATLAB tricks like reducing references to structures and substructures, or the Julia trick of rewriting the loop to use column-major matrices, but with little to no improvement.

For context, ray and model are immutable structures and I’m storing stuff in their fields that are defined as Float64 arrays with 1 dimension (hence the broadcasting when storing stuff in ray’s fields).

struct mod_sub
 n::Array{Float64,1}
 x::Array{Float64,1}
 y::Array{Float64,1}
 v::Array{Float64,1}
end

struct model_const
 u::mod_sub
 A::mod_sub
 t::mod_sub
 error::Array{Float64,1}
 BIC::Array{Float64,1}
 fit::Array{Float64,1}
 nfit::Array{Float64,1}
 llh::Array{Float64,1}
end

struct ray_const
 L::Array{Float64,1}
 u::Array{Float64,1}
 A::Array{Float64,1}
 theta::Array{Float64,1}
 phi::Array{Float64,1}
 t::Array{Float64,1}
 x::Array{Float64,1}
 y::Array{Float64,1}
 sta::Float64
 evt::Float64
 pick::Array{Float64,1}
end

function raytracer!( )

r = 1:20

model = model_const( mod_sub(8*ones(1), 100*rand(Float64, 8), 100*rand(Float64, 8), rand(Float64, 8)),
    mod_sub(8*ones(1), 100*rand(Float64, 8), 100*rand(Float64, 8), 0.1*rand(Float64, 8)),
    mod_sub(8*ones(1), 100*rand(Float64, 8), 100*rand(Float64, 8), pi*rand(Float64, 8) .+ -pi),
        0.01*ones(1), ones(1), ones(1), ones(1), ones(1))

ray = ray_const(ones(length(r) - 1), ones(length(r) - 1), ones(length(r) - 1), ones(length(r) - 1),
        ones(length(r) - 1), ones(1), range(0, 100, length=20), range(0, 100, length=20), 1, 1, ones(1))

np = length(ray.x)

u = zeros(np)
A = zeros(np)
t = zeros(np)

for k = 1:np

    ru = [ ((ray.x[k] - mx).^2) .+ ((ray.y[k] - my).^2) for mx in model.u.x, my in model.u.y]
    rA = [ ((ray.x[k] - mx).^2) .+ ((ray.y[k] - my).^2) for mx in model.A.x, my in model.A.y]
    rt = [ ((ray.x[k] - mx).^2) .+ ((ray.y[k] - my).^2) for mx in model.t.x, my in model.t.y]

    u[k] = sum((model.u.v ./ ru))./sum(1 ./ ru)
    A[k] = sum((model.A.v ./ rA))./sum(1 ./ rA)
    t[k] = sum((model.t.v ./ rt))./sum(1 ./ rt)

end

dx      = diff(ray.x);
dy      = diff(ray.y);

ray.L   .= sqrt.(dx.^2 .+ dy.^2)
ray.phi .= atan.(dy./dx)

ray.u     .= 0.5*(2*u[1:(length(u)-1)] .+ diff(u));
ray.A     .= 0.5*(2*A[1:(length(A)-1)] .+ diff(A));
ray.theta .= 0.5*(2*t[1:(length(t)-1)] .+ diff(t));

ray.t .= sum(ray.L.*(ray.u.*(1 .+ ray.A.*cos.(2*(ray.phi .- ray.theta)))))

return ray

end

If you can provide a minimal working example it will be much easier to help. There are many thing that can be improved apparently, but what really matters will depend on the sizes of your arrays, for example how large is np or the arrays you are allocating there as ru, rt, ...

1 Like

Yea, was doing that as you typed, sorry about that. Edited version should be executable.

1 Like

I think the ray.x[k] - mx, etc term changes on each iteration of the loop and each list comprehension since mx is defined by model fields on each of the three lines. But not sure if this is optimal. Tried vectorizing it, then found the part of the performance guide that says that probably won’t help in Julia and didn’t get much speed up.

you’re right, I mis-read your original code.

no worries, thanks for responding. Appreciate it.

1 Like
    for k = 1:np

        helper(x,y) = ((ray.x[k] - x).^2) .+ ((ray.y[k] - y).^2)
        ru = map(helper,model.u.x, model.u.y)
        rA = map(helper,model.A.x, model.A.y)
        rt = map(helper,model.t.x, model.t.y)

this seems to help a bit.

Also this saves some allocation:

sum(1 ./ ru) ->
sum(inv, ru)
1 Like

order of magnitude improvement with @btime on the full code - can’t thank you enough my friend.

(ray.x[k] - mx).^2 stays the same for each element of the my in ... iterations. Similarly, ((ray.y[k] - my).^2) stays the same for each value of mx. You can compute those separately and then combine them so you would calculate (r-m)^2 roughly n times instead of n^2, but you need to (re-)use pre-allocated storage so that extra allocations to store the sub-expressions don’t eat any gains.

This doesn’t reuse pre-allocated storage, but it demonstrates the idea:

rmx2 = [(ray.x[k] - mx).^2 for mx in model.u.x]
rmy2 = [(ray.y[k] - my).^2 for mx in model.u.y]
ru = [x+y for x in rmx2, y in rmy2]

By the way, this is unnecessary in Julia (unlike Matlab), so there’s no need to worry about it. In fact, adding more structure to your data can help improve clarity and even improve performance when it gives the compiler more information about what’s going on.

5 Likes

:heart: this about Julia

I am noticing a lot of calls to ones(1). If the size of these are fixed I would suggest either making the struct mutable and storing the mutable values directly in it, or use Base.RefValue instead of Array

MATLAB port showing through.

Yea just started using Julia. No one in my corner of geophysics has really caught on to the fact that there’s a high level language with speeds like C so everyone is stuck on MATLAB. Trying to fix that.

1 Like

oops I think I have a bug that map produces a vector instead of a matrix. The speedup from this map is just an illusion

1 Like

Didn’t have time to do more, but: if your vectors of modsub are actually of size 8, it is possible that you can use StaticArrays and improve the performance quite a bit.

Also, you can use Parameters and make the code much clearer. Here is an example of how it can look for your first two structures:

using Parameters
using StaticArrays

@with_kw struct ModSub{N}
 n::Int = N
 x::SVector{N,Float64} = 100*rand(SVector{N,Float64})
 y::SVector{N,Float64} = 100*rand(SVector{N,Float64})
 z::SVector{N,Float64} = rand(SVector{N,Float64})
end

@with_kw struct ModelConst{N}
 u::ModSub{N} = ModSub{N}()
 A::ModSub{N} = ModSub{N}()
 t::ModSub{N} = ModSub{N}()
 error::Float64 = 0.01
 BIC::Float64 = 1.0
 fit::Float64 = 1.0
 nfit::Float64 = 1.0
 llh::Float64 = 1.0
end

# A default ModSub of length 8 can be defined with:
m = ModSub{8}()

# This is how you will, then, initialize your model, changing some parameters
# of the other ModSubs
n=8
model = ModelConst{n}( u = ModSub{n}(),
                       A = ModSub{n}(z = 0.1*rand(SVector{n,Float64})),
                       t = ModSub{n}(z = π*rand(SVector{n,Float64}) - π) )

you can follow the same line with RayConst (structs are usually defined with uppercases like that in Julia).

Edit: the use of static arrays is optional. You can use Parameters to organize the code with normal arrays as well. Note that Array{Float64,1} is the same as Vector{Float64}

2 Likes

Also worth noting that having n::Int = N doesn’t provide any benefit. (and just slightly increases memory usage).

1 Like

Maybe if that is used for something else, it is cumbersome to recover that from the type signature, isn’t it?

I haven’t looked at this at all, but maybe you’ll find it useful SeismicJulia · GitHub

I hacked on it a bit here, leaving the evolution for clarity.

@btime comes from the BenchmarkTools package and otherwise I stayed within base and left all variable types alone / tried not to introduce too many assumptions (I think length(ModSub.x) == length(ModSub.v) might be required always anyway).

5x speedup and removed >99% of allocations, which will help you a lot if this is a function you are going to call frequently.

4 Likes