Understanding Julia performance in simple finite difference code

ChrisRackauckas · June 7, 2018, 3:44am

For the last part, have you rebuilt your system image (or compiled Julia from source) and set the optimization level to -O3? Julia won’t use SIMD automatically without that, and that could be the final kicker.

lwhitefox · June 7, 2018, 3:49am

My thinking at this point is largely influenced by Mathematica, where multiplying a list (1d array) by a scalar (e.g., Float64) simply applies the operation to each element in the list (multiplying by the scalar, presumably optimized for the Mathematica way of doing things). So, I’m having to retrain my thinking in several ways as I move into Julia (and away from Mathematica, C++, C, etc.).

Thanks for the tips - I see you also modified another line to properly apply broadcasting so I’ll add that too, but I think I’m now making good progress.

lwhitefox · June 7, 2018, 3:51am

That I’ve definitely not done! I’ve so far played with the JuliaPro and the recently released 0.7, on Mac OS X mainly though I’ve played on Linux too.

Is there a convenient online guide to doing the system image rebuild?

rdeits · June 7, 2018, 3:53am

I always end up googling Chris’s blog when I need to remember how to rebuild the system image (see “gotcha #7”): 7 Julia Gotchas and How to Handle Them - Stochastic Lifestyle . Those instructions work for v0.6, but I haven’t tried them in v0.7.

Enabling O3 is just a matter of starting julia with the -O3 command-line flag. I’m not sure how to do that in JuliaPro.

lwhitefox · June 7, 2018, 4:03am

I launch Julia 0.7 with -O3 but I don’t get any difference in time.

If I understand the comments from rdeits this should be the same as system image rebuild, but let me know if I’m wrong!

rdeits · June 7, 2018, 4:10am

Nope, -O3 is orthogonal to rebuilding the system image, and you need to do both to turn on every single fancy compiler optimization.

lwhitefox · June 7, 2018, 4:12am

OK, got it - I’ll dig into rebuilding too

lwhitefox · June 7, 2018, 4:16am

Oh, and one last point for the night (my time zone) - I turned off the file writing stuff in the code I posted, and that drops run time down to about 0.2 sec. It is a little hard to make an “apples to apples” comparison with the C code, because it includes some options for various kinds of I/O (it would be a little painful to go through and comment out undesired options). However, for test purposes, I’m running it set up to only print out a single file at the very end, so this is probably a reasonably fair comparison.

So, at this point, it is either a win for Julia or, at worst, a tie!

Actually, it is a win for Julia either way, since the code is more readable and compact (at least as I start to grasp its nuances).

ChrisRackauckas · June 7, 2018, 4:50am

This is at least where it is in Juno

Tamas_Papp · June 7, 2018, 6:13am

Going to Chris’s blog is always nice, but this particular solution is in the documentation:

https://docs.julialang.org/en/latest/devdocs/sysimg/#Building-the-Julia-system-image-1

DNF · June 7, 2018, 6:20am

Forgive me if I’m wrong, but it seems like the purpose of the nNodes input argument is to hold the length of the other vectors. That seems like a risky way of implementing it.

Shouldn’t you rather remove that variable and instead have

nNodes = length(c)

in whatever function you need it (or use end in the indexing expressions)? Right now it looks really fragile.

ScottPJones · June 7, 2018, 7:37am

I see that lately a lot myself, comparing C to Julia.
I wonder if it’s time to change some of the things in the documentation that say “nearly as fast as C”

lwhitefox · June 7, 2018, 3:28pm

DNF:

Forgive me if I’m wrong, but it seems like the purpose of the nNodes input argument is to hold the length of the other vectors. That seems like a risky way of implementing it.

Shouldn’t you rather remove that variable and instead have
nNodes = length(c)
in whatever function you need it (or use end in the indexing expressions)? Right now it looks really fragile.

You are certainly correct as the code is right now. In the long run there does need to be a parameter nNodes that is a value controlling the set up of the vectors containing material properties. It needs to be set either by the user directly or, possibly, it could be determined based on the number of values in input files too.

However, once that value is set, there are probably places where it would make more sense to use length() - more things to address to finish porting from C!

lwhitefox · June 7, 2018, 4:04pm

For more experienced folks it may be obvious, but it took me a little time to find this settings panel So, for other newcomers, you can get it by selecting the following menu items:

Packages->Julia->Settings…

Don’t try the general Preferences option

lwhitefox · June 7, 2018, 4:40pm

I’d highly recommend that blog post to anyone coming to Julia from a C/C++ like background (and Mathematica). The discussion of arrays in terms of pointers helps me to get a much better grasp of why changes in my code made such a difference.

Same thing with discussion of REPL and globals, as that was a key piece I’d missed before.

Elrod · June 7, 2018, 4:45pm

I’ve been playing around with gcc, but for the life of me I can’t get simple programs to run as fast as in Julia.
Gcc seems relatively (extremely) hesitant to use 256-wide instructions and prefers 128. It’s thus often 50-100% slower than Julia on my computer.
And when coerced into using 256-wide via -mprefer-vector-width=256, it suddenly gets even slower.
There’s probably something I’m missing, so I asked about it on the gcc help mailing list:
https://gcc.gnu.org/ml/gcc-help/2018-06/msg00018.html

(Disclaimer: I haven’t tried to reproduce this on different computers.)

Overall, my impression is that it’s way easier for non-computer scientists to get blistering fast Julia code than it is C/C++/Fortran!
Although, I’d be willing to accept Julia being several times slower just for the sake of ease of use…

EDIT:
On the recent hackernews post by oxinabox, I saw someone say that Julia is probably much slower than advertised in practice, citing lots of StackOverflow posts and the infamous “Giving up on Julia”.
I think there’re just a lot less people moving from Julia to other languages. Probably because of my inexperience, C/C++/Fortran are all in all failing to live up to the “as fast as Julia!” I’ve implicitly been promised.

StefanKarpinski · June 7, 2018, 5:17pm

Any interest in writing a blog post to that effect? Maybe with a click-bait title

lwhitefox · June 7, 2018, 7:27pm

Hmmm - I tried first the official directions at docs.julialang.org, and had errors that prevented completion. I then used “Chris’s blog”, and it worked like a charm. This was with JuliaPro-0.6.2.2. I did nothing to try to fix the issues (except using the alternate form of statements from the blog):

Here is the error output using the official instructions:

julia> include("/Applications/JuliaPro-0.6.2.2.app/Contents/Resources/julia/Contents/Resources/julia/share/julia/build_sysimg.jl")

julia> build_sysimg(sysimg_path=default_sysimg_path(), cpu_target="native", userimg_path=nothing; force=false)
ERROR: MethodError: no method matching build_sysimg(::Void, ::String, ::Void; sysimg_path="/Applications/JuliaPro-0.6.2.2.app/Contents/Resources/julia/Contents/Re
sources/julia/lib/julia/sys", cpu_target="native", userimg_path=nothing, force=false)
Closest candidates are:
  build_sysimg(::Any, ::Any, ::Any; force, debug) at /Applications/JuliaPro-0.6.2.2.app/Contents/Resources/julia/Contents/Resources/julia/share/julia/build_sysimg.jl:28 got unsupported keyword arguments "sysimg_path", "cpu_target", "userimg_path"
  build_sysimg(::Any, ::Any) at /Applications/JuliaPro-0.6.2.2.app/Contents/Resources/julia/Contents/Resources/julia/share/julia/build_sysimg.jl:28 got unsupported keyword arguments "sysimg_path", "cpu_target", "userimg_path", "force"
  build_sysimg(::Any) at /Applications/JuliaPro-0.6.2.2.app/Contents/Resources/julia/Contents/Resources/julia/share/julia/build_sysimg.jl:28 got unsupported keyword arguments "sysimg_path", "cpu_target", "userimg_path", "force"
Stacktrace:
 [1] (::#kw##build_sysimg)(::Array{Any,1}, ::#build_sysimg, ::Void, ::String, ::Void) at ./<missing>:0 (repeats 2 times)
 [2] eval(::Module, ::Any) at ./boot.jl:235
 [3] eval(::Any) at ./boot.jl:234
 [4] macro expansion at /Applications/JuliaPro-0.6.2.2.app/Contents/Resources/pkgs-0.6.2.2/v0.6/Atom/src/repl.jl:117 [inlined]
 [5] anonymous at ./<missing>:?

Tamas_Papp · June 7, 2018, 8:28pm

~~I think that the method invocation changed a bit between versions, and I linked the latest one. Try eg the 0.6.3 version.~~ EDIT: nope, it did not change at all, but I think what you have in the documentation is just the argument syntax, not an example invocation.

Topic		Replies	Views
Question - for loop - variable assignments Performance	5	1184	January 29, 2020
Confused with @time result for simple addtion of arrays Performance	9	701	January 2, 2020
Simple For Loops and Speed New to Julia question , performance	8	3429	November 19, 2017
Example C vs Julia vs Python Performance	41	4212	January 23, 2020
Question for lower performance by using @threads in for loop New to Julia question	13	1054	July 9, 2021

Understanding Julia performance in simple finite difference code

Related topics