Hello!
Thanks for taking a moment to have a look at my question
Full disclosure: I’m only a few weeks into learning Julia, brand new to Discourse and an interdisciplinary engineer (not a computer scientist) by training.
I wrote a simple function to calculate the Hamming distance between two strings for an exercise in an online learning course for Julia.
First I used a simple for
loop which compared each character in the strings sequentially and tallied up those that didn’t match. This function’s code → distance_loop()
which worked nicely.
Just out of curiosity and trying to learn about Julia, I tried re-writing the for
loop to be identical, but on a single line using Julia’s nice array comprehension syntax, which also worked nicely. This function’s code → distance_array_comp()
.
Being new to examining function performance and to keep things simple I tested their performance using the @time
macro, and two randomly generated strings of 1 million characters each.
Trying my best to be consistent, I restarted Julia, generated the two long strings and ran the @time
test for each function once each to compile them, and then once more to time the compiled versions. The two functions produced identical answers, as expected.
This is the difference reported by the @time
macro:
julia> @time distance_loop(long_rand_str_1, long_rand_str_2)
0.011152 seconds (1 allocation: 16 bytes)
749296
julia> @time distance_array_comp(long_rand_str_1, long_rand_str_2)
0.124214 seconds (748.81 k allocations: 20.426 MiB)
749296
My (first-ever) question(s) for the community are:
- Am I mis-using the array comprehension syntax?
- Why does changing this syntax make such a big difference?
- Should this change in syntax make such a big difference?
I would be very grateful for help in what’s going on and look forward to your feedback!
P.S. I used @code_lowered
on each function to see whether I could spot any differences between the functions → to me they were identical - happy to provide these on request!
P.P.S. I then used @code_typed
on each function and there were many differences, though I was struggling to comprehend what they were at this stage, perhaps here is a good place to look? Also happy to provide these on request!
Version info
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
CPU: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)
Environment:
JULIA_NUM_THREADS = 2
Minimum working example code
function distance_array_comp(a::AbstractString, b::AbstractString)
differences = 0
[differences += 1 for i in eachindex(a) if a[i] != b[i]]
return differences
end
function distance_loop(a::AbstractString, b::AbstractString)
differences = 0
for i in eachindex(a)
if a[i] != b[i]
differences += 1
end
end
return differences
end
long_rand_str_1 = join(rand(['A', 'C', 'G', 'T'], 10^6))
long_rand_str_2 = join(rand(['A', 'C', 'G', 'T'], 10^6))
@time distance_loop(long_rand_str_1, long_rand_str_2)
@time distance_array_comp(long_rand_str_1, long_rand_str_2)