Why do functions like similar and copy make 2 allocations for large arrays?

Hi, Julia Community!

I am trying to wrap my head around this one: Functions like similar and copy seem to make 2 allocations if the array is sufficiently large. Can you help me understand why?

julia> using BenchmarkTools

julia> a, b = rand(100), rand(10000);

julia> @btime similar($a)
  35.473 ns (1 allocation: 896 bytes)
100-element Vector{Float64}:
 2.251444318e-314
 2.251444334e-314
 2.25144435e-314
 ⋮
 0.0
 0.0
 0.0

julia> @btime similar($b)
  54.196 ns (2 allocations: 78.17 KiB)
10000-element Vector{Float64}:
 0.0
 0.0
 0.0
 ⋮
 0.0
 0.0
 0.0

I get the same number of allocations for copy, which I guess makes sense, because it is probably implemented in terms of similar.

Am I simply doing something wrong with the @btime macro?
If this is real, how could I copy an array with just one allocation?
Or is that not a good idea for reasons I am not aware of?

Cheers
Mike

Just a guess: some implementation detail that changes the undelying low level algoritm (or the array implementation itself) depending on object size ??

Which Julia version are you on? The Memory changes of 1.11 are known to cause this behavior

julia> versioninfo()
Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 10 × Apple M1 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)

I am actually not worried about performance, just puzzled, because it seems counterintuitive to me.

PS: The turning point is 2^11, by the way.

Yeah, it’s well-known, and it’s ‘always’ been this way. I can’t exactly explain what happens, but you can find the exact size where it occurs by trial and error (I think it’s some length 2^n edit: Oops, n=11 as you already found).

I wouldn’t worry about it, unless you are simply curious.

1 Like

This is curious, I see identical output of @code_llvm and @code_native (modulo some auto-generated function names as far as I can tell), is there a chance that this is a benchmarking artefact?

There is slightly different treatment of memory smaller and larger than 2048:

2 Likes

Thanks @everyone you for your input! For me it was mostly about knowing that I’m not doing something obviously stupid.

The C code really helped for digging deeper. After searching a little bit, I eventually found the corresponding documentation page, which explains this in quite some detail.

1 Like