Conditional array comprehension vs. simple for loop - 750k allocations vs. 1 allocation - why?

jlawrie · June 20, 2021, 5:19pm

Hello!

Thanks for taking a moment to have a look at my question

Full disclosure: I’m only a few weeks into learning Julia, brand new to Discourse and an interdisciplinary engineer (not a computer scientist) by training.

I wrote a simple function to calculate the Hamming distance between two strings for an exercise in an online learning course for Julia.

First I used a simple for loop which compared each character in the strings sequentially and tallied up those that didn’t match. This function’s code → distance_loop() which worked nicely.

Just out of curiosity and trying to learn about Julia, I tried re-writing the for loop to be identical, but on a single line using Julia’s nice array comprehension syntax, which also worked nicely. This function’s code → distance_array_comp().

Being new to examining function performance and to keep things simple I tested their performance using the @time macro, and two randomly generated strings of 1 million characters each.

Trying my best to be consistent, I restarted Julia, generated the two long strings and ran the @time test for each function once each to compile them, and then once more to time the compiled versions. The two functions produced identical answers, as expected.

This is the difference reported by the @time macro:

julia> @time distance_loop(long_rand_str_1, long_rand_str_2)
  0.011152 seconds (1 allocation: 16 bytes)
749296

julia> @time distance_array_comp(long_rand_str_1, long_rand_str_2)
  0.124214 seconds (748.81 k allocations: 20.426 MiB)
749296

My (first-ever) question(s) for the community are:

Am I mis-using the array comprehension syntax?
Why does changing this syntax make such a big difference?
Should this change in syntax make such a big difference?

I would be very grateful for help in what’s going on and look forward to your feedback!

P.S. I used @code_lowered on each function to see whether I could spot any differences between the functions → to me they were identical - happy to provide these on request!

P.P.S. I then used @code_typed on each function and there were many differences, though I was struggling to comprehend what they were at this stage, perhaps here is a good place to look? Also happy to provide these on request!

Version info

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)
Environment:
  JULIA_NUM_THREADS = 2

Minimum working example code

function distance_array_comp(a::AbstractString, b::AbstractString)
    differences = 0
    [differences += 1 for i in eachindex(a) if a[i] != b[i]]
    return differences
end

function distance_loop(a::AbstractString, b::AbstractString)
    differences = 0

    for i in eachindex(a)
        if a[i] != b[i]
            differences += 1
        end
    end

    return differences
end

long_rand_str_1 = join(rand(['A', 'C', 'G', 'T'], 10^6))
long_rand_str_2 = join(rand(['A', 'C', 'G', 'T'], 10^6))

@time distance_loop(long_rand_str_1, long_rand_str_2)
@time distance_array_comp(long_rand_str_1, long_rand_str_2)

Elrod · June 20, 2021, 5:32pm

This will perform better because you aren’t changing the binding to differences in the closure:

function distance_array_comp(a::AbstractString, b::AbstractString)
    differences = Ref(0)
    [differences[] += 1 for i in eachindex(a) if a[i] != b[i]]
    return differences[]
end

However, the comprehension is still allocating a return vector.

Compare the @code_warntype of this function with the original. You’ll notice the Core.Box and Anys that were highlighted in red have now disappeared.

If all you’re interested in is DNA base pairs, you might be better off with Vector{Char}s.

long_rand_1 = rand(['A', 'C', 'G', 'T'], 10^6);
long_rand_2 = rand(['A', 'C', 'G', 'T'], 10^6);
function distance_loop2(a, b)
    differences = 0

    @inbounds for i in eachindex(a)
        differences += a[i] != b[i]
    end

    return differences
end

will be much faster for this operation than joining.

jling · June 20, 2021, 5:40pm

yes, in the sense that if you don’t want to allocate a return vector as big as number of iteration, you shouldn’t be using comprehension.

you could, however, do it in one-liner:


julia> @btime distance_loop($long_rand_str_1, $long_rand_str_2)
  6.042 ms (0 allocations: 0 bytes)
749914

julia> function distance_each(a,b)
           count(a[i] != b[i] for i in eachindex(a))
       end
distance_each (generic function with 1 method)

julia> @btime distance_each($long_rand_str_1, $long_rand_str_2)
  7.800 ms (0 allocations: 0 bytes)
749914

Seif_Shebl · June 20, 2021, 11:30pm

If you like one-liners, you can use a generator instead of a comprehension as it doesn’t allocate and it has the same performance as a loop.

@btime sum($a[i] != $b[i] for i in eachindex($a))
  256.300 μs (0 allocations: 0 bytes)
749829

If you want the shortest, you can use this (but it won’t be as fast as a generator because it allocates):

@btime sum($a .!= $b)
  362.800 μs (4 allocations: 126.42 KiB)
749829

jlawrie · June 23, 2021, 7:13pm

Thank your for confirming that the array comprehension was generating a vector. Rookie mistake - I the clue was in the name all along! That makes sense as to why it was doing so much allocation.

In this case, the task happened to require two input arguments of the type string, but otherwise I agree this would be a nicer and faster way to work with the base pairs .

I did not know it was possible to increment based on count += true, that’s really neat, thanks for the tip!

mcabbott · June 23, 2021, 7:23pm

Then it might be quicker to iterate them, not to index:

distance_zip(as,bs) = count(a!=b for (a,b) in zip(as, bs))

jlawrie · June 23, 2021, 7:27pm

Thank you for answering my question about whether I was mis-using the array comprehension syntax - I suspected so but glad to confirm this.

Writing an inline command for i in itr was the aim, but upon reflection I think I reached for a solution using an array comprehension because that was the form which I am most familiar with (at this stage).

Using the inline syntax but as part of a function call is definitely a much better idea!

I must remember to try using functions with names matching the things that I want to do more often, this usually seems to lead to a good answer in Julia for me (so far as a beginner anyway).

jlawrie · June 23, 2021, 7:35pm

Thank you for your ideas and for showing how small differences to the syntax can sometimes cause a difference. I definitely would have not spotted that the second version would allocate and the first wouldn’t by myself

I looked at the solutions written by some of the other members of the community who had also completed this task.

FWIW I liked this version which strays away from the for i in itr syntax a little and seems to be just as speedy (or be just as well optimised by the compiler).

mapreduce(≠, +, a, b, init = 0)

zdenek_hurak · June 23, 2021, 7:56pm

The whole trick is discussed here Single- and multi-dimensional Arrays · The Julia Language

Comprehensions can also be written without the enclosing square brackets, producing an object known as a generator. This object can be iterated to produce values on demand, instead of allocating an array and storing them in advance…

jlawrie · June 23, 2021, 8:05pm

You’re absolutely right, thank you! For me it was about 5 times faster than the original version with the simple short for loop

From the REPL:

julia> @time distance(a, b)
  0.014991 seconds (1 allocation: 16 bytes)
749981

julia> @time distance_zip(a, b)
  0.003066 seconds (1 allocation: 16 bytes)
749981

julia> 0.014991/0.003066
4.889432485322897

EDIT:

I just tried out the version using mapreduce() and it looks like it’s even a little faster than the zip() version (on my laptop using the simplest of benchmarks, at least).

julia> @time distance_mapreduce(a, b)
  0.002518 seconds (1 allocation: 16 bytes)
749981

julia> @time distance_zip(a, b)
  0.003585 seconds (1 allocation: 16 bytes)
749981

julia> 0.003585/0.002518
1.4237490071485308

jlawrie · June 23, 2021, 8:12pm

That’s written concretely and clearly about how it will not allocate memory. Thank you for pointing me towards the relevant part of the documentation!

baggepinnen · June 24, 2021, 8:41am

Bool is a subtype of Integer in julia, they thus behave like 1/0

julia> true == 1
true

julia> true isa Integer
true

julia> typetree(Bool)
Bool <: Integer <: Real <: Number <: Any

rafael.guerra · June 24, 2021, 7:39pm

Which version of Julia enables this?

baggepinnen · June 24, 2021, 7:46pm

Sorry, I forgot it was my own function, it’s here

typetree(x) = typetree(typeof(x))
typetree(::Type{Any}) = println("Any")
typetree(T::Union{DataType, UnionAll}) = (print(T, " <: "); typetree(supertype(T)))

rafael.guerra · June 25, 2021, 8:26am

@jlawrie, fyi your mapreduce() and @mcabbott’s zip iterating solutions have the same performance here and both run faster for the long random strings above than the Hamming() function in StringDistances.jl, which is already very fast and has other options.

NB:
The Levenshtein distance, which takes much longer to run, has some Julia0.6 code posted in RosettaCode.org, whose iterative version seems buggy.

Topic		Replies	Views
Fast hamming distance function in Julia that returns a distance matrix Performance question , distances	12	2329	February 21, 2021
For-loop vs list-comprehension Performance	15	1992	April 7, 2021
Comprehensions and vectorization New to Julia	13	4874	September 20, 2017
Can someone review my Julia notebook New to Julia	2	282	August 6, 2022
Sunday Small challenge General Usage performance	10	601	August 1, 2022

Conditional array comprehension vs. simple for loop - 750k allocations vs. 1 allocation - why?

Related topics