Speeding up julia

Anirudh_Krishna · July 11, 2017, 5:22pm

Hi everyone,

Disclaimer: This is a noob question. I’ve written a program to do some numerics and have it working. Using the information found here and here, I’ve given my code a 2.5x speed boost.

Here are things that I’ve done:

Devectorized my code. I’ve even converted array multiplications using .* to explicitly writing them out.
Reduced temporary array creation in the middle of the loops.

I’m here to see how much more I can squeeze out of my code and have some questions.

Coming from Python, my code reads arrays in a row by row fashion i.e. indices to the right of an array loop faster than those on the right in for loops. Since Julia stores memory column-wise, I was considering changing my program. Would it suffice to merely transpose the arrays that I’m using, perform the computation and then transposing back or would you recommend re-writing the code to be tailored to column-wise computation?
My code estimates certain quantities by Markov sampling. Since each iteration is independent, I spread my jobs across several cores when I performed similar computations in Python. How would I go about doing the same here? I’m not sure the @parallel flag does what I want.

Comment:

I’ve added @simd and @inbounds checks but they don’t seem to have much of an effect. I hesitate using @fastmath as I don’t want to lose precision.

I could go ahead and post my code her but it’s ~500 lines and I’m not sure it’ll be appropriate to do so.

If you have any other recommendations (such as blog posts that explain how to speed up Julia for dummies), they’re more than welcome.

Thanks!

stevengj · July 11, 2017, 5:35pm

If you’re doing enough computations, the cost of the transposes shouldn’t matter, but in the long run if you stick with Julia it will be cleaner to just organize your data for Julia.

(I would suggest putting your code into a github gist and posting a link.)

rdeits · July 11, 2017, 7:50pm

Chris has a great blog post on some useful ways to speed up Julia: http://www.stochasticlifestyle.com/7-julia-gotchas-handle/ The post is based on Julia v0.5, but it should still be good advice for Julia v0.6.

One thing worth nothing is that a lot of the advice from your first link Fast Numeric Computation in Julia is good but is very out of date (it’s almost 4 years old at this point). Julia has come a long way since then, and some of the techniques it recommends are no longer as necessary. For example, that link recommends merging computations into a single loop. Let’s say we want to compute sin(2 * (x + 1)) for a vector x. That link suggests that you do the following:

for i in 1:length(x)
  y[i] = sin(2 * (x[i] + 1))
end

instead of creating temporary vectors for the quantities (x + 1), (2 * (x + 1)), and (sin(2 * (x + 1))). That’s a perfectly fine way to do things, but we have something better now! In Julia v0.6, you can just write:

y .= sin.(2 .* (x .+ 1))

The “dot” operators and “dot” function calls like sin.() automatically perform loop fusion. That is, they automatically create a single loop over the data just like the manually written-out loop above. For more on that, check out More Dots: Syntactic Loop Fusion in Julia

Also there’s probably no benefit to manually writing out .* anymore, so feel free to use .* if it makes your code clear.

This means that, often, instead of manually de-vectorizing your code, you can just add a few dots and get all of the benefit with much less code.

Another small suggestion: ~~you may need to run Julia with the -O3 flag from the command line to fully see the benefits of @simd.~~ edit: never mind that.

yuyichao · July 11, 2017, 7:59pm

I don’t think -O3 affects @simd. It enables vectorization of straight line code but does make loop vectorization better.

rdeits · July 11, 2017, 8:10pm

Ah, my mistake. Most of what I know about this is from random snippets of conversation.

mcreel · July 12, 2017, 5:29am

A little typo there, it should be y = sin.(2 .* (x .+ 1))

ChrisRackauckas · July 12, 2017, 6:17am

No, that should be .= to do an inplace update of the pre-allocated y if we’re talking about speed.

mcreel · July 12, 2017, 6:57am

Ah, right, assuming y has been pre-allocated with the correct size and type.

Anirudh_Krishna · July 12, 2017, 5:41pm

Hi everyone. Thanks for the input. I found the blog post ‘7 Julia gotchas’ helpful. Does anyone have any comments with regards to spreading my jobs over multiple cores?

I should have also mentioned that another reason that I think I don’t want to post my code is that it is part of my research and I’d like to publish a paper before making the code publicly available.

stevengj · July 12, 2017, 5:45pm

See https://docs.julialang.org/en/latest/manual/parallel-computing/#Parallel-Computing-1 … e.g. you could use pmap or @parallel. (Or you could use some other communication mechanism, such as MPI via the MPI.jl package.)

Anirudh_Krishna · July 12, 2017, 6:26pm

Thank you!

The coin flip example is quite similar to the way I want to run things. I ran into some issues which I shall post here.

I’ve split my code into two files, one that contains all the functions that I want to run called ‘code.jl’ and another, called ‘exec.jl’ which is what I execute from the terminal.

@everywhere include(“code.jl”)
tic()
a = @spawn main()
b = @spawn main()
c = @spawn main()
d = @spawn main()

ans = fetch(a) + fetch(b) + fetch(c) + fetch(d);
toc()
print(‘ans’, ans)

I run this on the terminal using ‘julia -p 4 exec.jl’

However, Julia complains and says

WARNING: replacing module Distributions.
WARNING: replacing module Distributions.
WARNING: replacing module Distributions.
WARNING: replacing module Distributions.
elapsed time: 36.526136903 seconds
ERROR: LoadError: syntax: invalid character literal
in include_from_node1(::String) at /Applications/JuliaPro-0.5.0.5.app/Contents/Resources/julia/Contents/Resources/julia/lib/julia/sys.dylib:?
in include_from_node1(::String) at /Applications/JuliaPro-0.5.0.5.app/Contents/Resources/julia/Contents/Resources/julia/lib/julia/sys.dylib:?
in process_options(::Base.JLOptions) at /Applications/JuliaPro-0.5.0.5.app/Contents/Resources/julia/Contents/Resources/julia/lib/julia/sys.dylib:?
in _start() at /Applications/JuliaPro-0.5.0.5.app/Contents/Resources/julia/Contents/Resources/julia/lib/julia/sys.dylib:?
in _start() at /Applications/JuliaPro-0.5.0.5.app/Contents/Resources/julia/Contents/Resources/julia/lib/julia/sys.dylib:?
while loading /Users/akrishna/Dropbox/Decoding2/exec.jl, in expression starting on line 10
WARNING: Forcibly interrupting busy workers
WARNING: rmprocs: process 1 not removed

The first 4 warnings probably have to do with the ‘code.jl’ file opening with using Distributions. I’m not sure what’s going on with the rest.

EDIT: This seems to work when I call it from within Julia but not directly from the terminal.

Also, is there a better way of spreading my jobs across all cores available rather than listing spawns from a to d? I tried @everywhere main() and @everywhere fetch(main()) but neither returns anything.

yuyichao · July 12, 2017, 6:36pm

Is from

Anirudh_Krishna · July 12, 2017, 6:39pm

Thanks! Changing single quotes to double quotes gave me the right answer!

Now I would like to know if there is a more efficient way of distributing this across several cores rather than define a “spawn” for each core.

adamslc · July 12, 2017, 7:03pm

You could do something like:

futures = Vector{Future}(nworkers())
for w in workers()
    futures[w - 1] = @spawnat w main()
end

ans = sum(fetch.(futures))

The futures[w - 1] part is a little akward, but I haven’t been able to get this sort of syntax to work with OffsetArrays.jl or anything similar.

Anirudh_Krishna · July 12, 2017, 7:11pm

Thanks, that works! It’s a few seconds slower than what I had though. (Jumps from about 79s to 87s for some fixed parameters which I suspect will grow with problem size). It also gives the following warnings on exit:

WARNING: Forcibly interrupting busy workers
WARNING: rmprocs: process 1 not removed

FIXED: Apparently, you have to add rmprocs(workers()) at the end of the script.

I still haven’t managed to figure out why I am getting “WARNING: replacing module Distributions” 4 times when using 4 cores.

Anirudh_Krishna · July 12, 2017, 7:30pm

Hi everyone,

I found a nice blogpost that explains how to do pretty much what I want to do. The only error I’ve got so far is that Julia complains it doesn’t have the require function.

Topic		Replies	Views
Column-wise calculations in arrays are slower New to Julia	7	371	December 19, 2023
How to optimise and be faster than Java? Performance	24	1997	September 4, 2018
Speed up parallel maximum across columns Performance parallel , distributed , loops	1	402	August 18, 2020
General questions from Python user Performance	59	4237	March 8, 2021
No significant performance difference of row- vs column-wise loops New to Julia memory-allocation , loops	2	1486	July 15, 2019

Speeding up julia

Related topics