Julia sets in Julia, using Interact.jl - I'd like to get it faster

Hi, after I talked about Julia to collegues some weeks ago, I somehow wanted to plot Julia sets (Wikipedia), simply because I thougt it to be a funny example.

To play with it I used Interact.jl.
The current result is this 82 liner: Github gist

I tried to get rid of slow things I knew about:

  • use existing arrays instead of creating new ones
  • tried to avoid type changes

Then I attempted to utilize my 8 core Ryzen with 15 threads, by partitioning the image (its hight) into batches, and use @threads

This is faster, but much less than I hoped for:

With threads:

julia> @time juliaSet(-0.62+0.42im)
0.442297 seconds (19.41 M allocations: 321.882 MiB, 22.05% gc time)

Without:
julia> @time juliaSet(-0.62+0.42im)
1.208751 seconds (19.12 M allocations: 317.792 MiB, 4.61% gc time)

In addition the plot takes time of course.

Can I do better?

(This is rather a fun example, so I mainly do it for the fun of it, and to learn.)

Kind greetings, z.

1 Like

Check out the performance tips, particularly the very first one, which is “Avoid global variables”. Your code is entirely based on non-constant global variables, which have a huge negative performance impact. If you can rewrite it to avoid global variables entirely you should see a significant improvement.

6 Likes

Ah. Great resource, thank you!

Updated to local variables now. The line

@time juliaSet(batches, valueGrid, dataGrid, c)

is now called inside a function.

julia> j()
0.351686 seconds (19.11 M allocations: 317.267 MiB)

Slightly faster.

I should add:

julia> versioninfo()
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: AMD Ryzen 7 3700X 8-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, znver1)
Environment:
JULIA_NUM_THREADS = 16

I will try on tomorrow …

Checkout the julia fractal code from @sdanisch here, but keep in mind that this was pre 1.0 and I don’t know the current state of GPUArrays, but same code should also work with CUArrays.

1 Like

Interesting. I got a AMD card, so I’ll see … but not today.
Thank you!

function juliaset(z0, c, maxiter)
    z = z0
    for i in 1:maxiter
        abs2(z) > 4f0 && return (i - 1) % UInt8
        z = z * z + c
    end
    return maxiter % UInt8 # % is used to convert without overflow check
end

function threaded(output, input, c)
    Threads.@threads for i in eachindex(input)
        @inbounds output[i] = juliaset(input[i], c, 255)
    end
    return output
end
# scale should change with picture size
scale = 200.0 # the greater the more details
width, height = 800, 700
x = -0.5; y = 1.2
c = x+y*im
xvalues = LinRange(-width/2, width/2, width) ./ scale
yvalues = LinRange(-height/2, height/2, height) ./ scale
inputmatrix = xvalues' .+ yvalues .* im 
result = fill(UInt(8), height, width)

@time threaded(result, inputmatrix, c)

That should be faster… You could make it even faster with an inner loop, that can be SIMD accelerated

5 Likes

There is also GPU implementation of Mandelbrot set, which can easily be adapted to your needs https://youtu.be/xVLxTk3SqsA

1 Like

Wrong: Using your version of juliaset(…) immediately was actually slower, but @code_warntype pointed out that in z = z0 and thus in abs2(z) the type was assumed to be Any.
I take this paragraph back. Tried to reproduce this with a fresh REPL, and it was fast now. Sorry.

Using your function as function juliaset(z0 :: Complex{Float64}, c, maxiter) is three times faster than mine. This is cool!

Thank you!

Only 3 times?
I get:

@btime juliaSet($batches, $valueGrid, $dataGrid, $c)
51ms
@btime threaded($result, $inputmatrix, $c)
729.403 μs

Which is a speed up of ~70x

To what number did you set JULIA_NUM_THREADS before running Julia?
And are you using julia -O3 ?

I’m not at that computer in the moment, but for Thread.nthreads() I got 16.

Shouldn’t that be enough? (I did it in Juno.)

(Until now I only played with your version of the inner loop, as I don’t have an interesting amount of threads on my current machine.)

Alright, the inner version shouldnt be that different performance wise!


Make sure you have 3 for optimization level :wink:
Might not change anything, but I remember that 03 pretty much only enabled better optimizations for reducing over a variable in a loop, which we are actually doing here…

1 Like

I don’t think -O3 does really anything right now. At least I haven’t seen a single example where it is faster since SLP got enabled with -O2.

Yeah I thought almost nothing changes with -O3 but this one thing… but my information is pretty old :wink:

I come to the conclusion that I don’t know yet what I’m doing; getting results all over the place.

So finally updated the gist again. Gist
j(…) is now a comparing test, calling (at)btimes for both our code versions.

simon:
12.790 ms (116 allocations: 13.72 KiB)
zenon:
34.274 ms (2085420 allocations: 57.48 MiB)

Well this is AFTER I changed my code:

  1. corrected UInt to a concrete UInt32 (change to 32 from 8, seems to do no harm) in my inner loop.
  2. replaced z^2 by z*z (from your code), inner loop
  3. replaced abs by abs2, which seems to be much faster for Complex{Float64} (from your code), inner loop
    4.used (at)inbounds (from your code), but that seemed not to change anything.

So. Many thanks.
I have to leave the large computer again, for 3 days. That was fun!

Kind greetings, z.

Having a quick look at the code, a couple of thoughts:

  • batchify could be replaced by Iterators.partition? But it also should be unnecessary because @threads already statically partitions the work into nthreads batches, I think.
  • In juliaSet you use for y = 1:width. This likely kills performance because width is a global variable. You can use size(dataGrid,2) instead.
1 Like

Ah, thank you!
I assumed there must be sth like batchify already, I just didn’t know where to find it :slight_smile:

The global vars are gone in rev. 2 and 3 of the gist.

In the latest version I can see, there’s still the use of global width in:

function juliaSet(batches, valueGrid, dataGrid, c)
    @threads for batch in batches
        for x in batch
            for y = 1:width
                @inbounds dataGrid[x,y] = julia(valueGrid[x,y], c, MAXITER)
            end
        end
    end
end
1 Like

Uuups. Thank you!, escaped me.

If you are doing Julia sets in Julia you might be tempted to replace z*z by z*z*z

2 Likes