Confused by extra allocations

robsmith11 · August 9, 2021, 6:30pm

I’ve written a function that loops over a large table and is running slower than I expected. The source of the problem seems to be a huge amount of extra allocations. I’ve created the following minimal example that demonstrates the issue:

using DataFrames
using Tables

function f(xs)
  a = Tables.columntable(xs)
  for _ in 1:length(a.x)
  end
end

julia> d = DataFrame((x=rand(10^6)));

julia> @time f(d)
  0.109198 seconds (3.01 M allocations: 61.923 MiB, 8.61% gc time, 24.60% compilation time)

julia> @time f(d)
  0.130291 seconds (3.00 M allocations: 61.020 MiB, 13.04% gc time)

Is this a bug? Why would an empty loop over a range generate millions of allocations (the number of allocations scales with the size of the dataframe).

jling · August 9, 2021, 6:46pm


julia> using DataFrames, Tables

julia> const d = DataFrame((x=rand(10^6)));

julia> @time length(Tables.columntable(d).x);
  0.126591 seconds (155.26 k allocations: 9.318 MiB, 99.71% compilation time)

julia> @time length(Tables.columntable(d).x);
  0.000026 seconds (11 allocations: 400 bytes)

julia> @time for i = 1:length(Tables.columntable(d).x) end;
  0.187395 seconds (3.00 M allocations: 61.020 MiB, 67.25% gc time)

julia> @time for i = 1:length(Tables.columntable(d).x) end;
  0.049252 seconds (3.00 M allocations: 61.020 MiB, 7.41% gc time)

indeed looks weird

cjdoris · August 9, 2021, 6:55pm

It’s type unstable. The type of a cannot be inferred, so the type of the range can’t be inferred. So all those allocations will be from unboxing and boxing while iterating over the range.

To avoid this you can put the main body of the function into a second function which accepts a itself.

jling · August 9, 2021, 7:43pm

Isn’t length just a number? why would it matter:

julia> @time begin L = length(Tables.columntable(d).x); for i = 1:L end end;
  0.054953 seconds (3.00 M allocations: 61.020 MiB, 13.82% gc time)

julia> @time begin L = length(Tables.columntable(d).x); for i = 1:L÷100 end end;
  0.000486 seconds (28.99 k allocations: 609.469 KiB)

julia> @time begin L = Int64(length(Tables.columntable(d).x)); for i = 1:L end end;
  0.050081 seconds (3.00 M allocations: 61.020 MiB, 9.13% gc time)

julia> @time begin L = Int64(length(Tables.columntable(d).x)); for i = 1:L÷100 end end;
  0.001008 seconds (28.99 k allocations: 609.469 KiB)

cjdoris · August 9, 2021, 8:08pm

length is just a function like any other. If the input types can’t be inferred then the compiler will struggle to infer the output types.

At best the compiler might infer the return type is Integer but this is abstract and the compiler needs to produce code that works with any subtype.

Topic		Replies	Views
DataFrame type instability - Unneccesary memory allocation Data	1	508	March 28, 2019
Unexpected Allocations in Julia 0.5 General Usage	4	570	December 5, 2016
Allocation puzzler General Usage	6	503	November 19, 2018
Allocations in function timing Performance	2	569	October 9, 2018
Understanding meanings of memory allocation numbers New to Julia performance , memory-allocation	3	722	June 22, 2018

Confused by extra allocations

Related topics