Binary data reader 200x+ slower on 0.7


#1

I read binary data into a sparse matrix. Basically I populate a row and data vector with append! and put them into a sparse matrix using SparseMatrixCSC. It was very fast on 0.6, but is 200x+ slower on 0.7. I haven’t changed the code, apart from fixing warnings which was related to allocating arrays. This is what I get:

0.7:

@time main()
71.208014 seconds (4.35 M allocations: 126.643 MiB, 0.02% gc time)

0.6:

@time main()
0.358602 seconds (5.32 M allocations: 146.586 MiB, 4.59% gc time)

Less allocation but 236 times slower. Profiling 0.6 shows nothing of interest. However, for 0.7 I get this (snippet):

         56992 ./array.jl:873; append!
          1     ./abstractarray.jl:768; copyto!(::Array{Int32,1}, :...
          5     ./abstractarray.jl:770; copyto!(::Array{Int32,1}, :...
           5 ./deprecated.jl:1615; Type
            5 ./abstractarray.jl:83; axes
             5 ./tuple.jl:162; map
              5 ./range.jl:226; Type
               5 ./range.jl:217; Type
                5 ./promotion.jl:436; max
          56973 ./abstractarray.jl:771; copyto!(::Array{Int32,1}, :...
           2     ./int.jl:52; -
           720   ./operators.jl:950; in
            720 ./promotion.jl:425; ==
           47438 ./operators.jl:953; in
           8813  ./operators.jl:954; in

I’m not really sure what happens inside append!, but the code seem to be stuck in append! and copyto! a lot. Is this something known I’m not aware of? Something I can do about it or wait for a fix?


#2

Some of the sparse functionality was moved to a standard library package SparseArrays in v0.7, so if you don’t have using SparseArrays, you will get deprecation warnings in v0.7 which will make your code much much slower.

To fix that, just add using SparseArrays to your code before you do any sparse operations in Julia v0.7.

In 1.0 and above, you’ll just get an error message without using SparseArrays rather than a deprecation warning which works but is slow.

Does that improve things?

Otherwise, it’s hard to say without seeing your code.


#3

Forgot to mention I fixed that warning too, meaning I have using SparseArrays in my module.

I though so, however for this package I can’t disclose the source. I was mostly wondering if there are known issues with append! and/or sparse matrices (I’m not doing any transposes, just populating a sparse matrix).

Maybe worth mentioning is that the issue persists on the current nightly (2018-07-14).


#4

I was mostly wondering if there are known issues with append!

Not that I know of. 0.6:

julia> using BenchmarkTools

julia> x = rand(3); y = rand(1);

julia> @benchmark append!($x, $y)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     11.754 ns (0.00% GC)
  median time:      12.917 ns (0.00% GC)
  mean time:        16.965 ns (0.00% GC)
  maximum time:     20.693 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

0.7:

julia> x = rand(3); y = rand(1);

julia> using BenchmarkTools

julia> @benchmark append!($x, $y)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     10.189 ns (0.00% GC)
  median time:      11.302 ns (0.00% GC)
  mean time:        15.552 ns (0.00% GC)
  maximum time:     20.734 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

#5

I might have an MWE.

MWE:

function test()
    x1 = Vector{Int64}() # x1 = Vector{Int64}(undef, 0) for 0.7 (no difference)
    for i = 1:1e5
        x2 = 1:3
	append!(x1, x2)
    end
end

0.6:

@time test() # second run
  0.005858 seconds (23 allocations: 5.001 MiB)

0.7:

@time test() # second run
  9.813174 seconds (23 allocations: 5.001 MiB)

x2 inside loop to emulate my code. Issue seem to be with UnitRange. collect does not help.


#6

Interesting. I can reproduce. You could file an issue.
As a workaround, you could try

function append2!(x, y)
    for yi ∈ y
        push!(x, yi)
    end
end

That solves the problem for me, and is again.
Also, if these vectors are long, and you’re adding many pieces, it may be worth trying sizehint!.

julia> @time test()
 11.043984 seconds (686.55 k allocations: 40.692 MiB, 0.09% gc time)

julia> @time test()
 10.911597 seconds (23 allocations: 5.001 MiB)

julia> using BenchmarkTools

julia> function test2()
           x1 = Vector{Int64}()
           for i = 1:1e5
               x2 = 1:3
               append2!(x1, x2)
           end
       end
test2 (generic function with 1 method)

julia> @time test2()
  0.055867 seconds (306.47 k allocations: 21.756 MiB, 8.64% gc time)

julia> @time test2()
  0.004321 seconds (23 allocations: 5.001 MiB)

julia> @benchmark test2()
BenchmarkTools.Trial: 
  memory estimate:  5.00 MiB
  allocs estimate:  19
  --------------
  minimum time:     1.662 ms (0.00% GC)
  median time:      1.807 ms (0.00% GC)
  mean time:        1.848 ms (2.94% GC)
  maximum time:     2.479 ms (20.74% GC)
  --------------
  samples:          2628
  evals/sample:     1

julia> function test3()
           x1 = Vector{Int64}()
           sizehint!(x1, 3*10^5)
           for i = 1:10^5
               x2 = 1:3
               append2!(x1, x2)
           end
       end
test3 (generic function with 1 method)

julia> @benchmark test3()
BenchmarkTools.Trial: 
  memory estimate:  2.29 MiB
  allocs estimate:  2
  --------------
  minimum time:     1.193 ms (0.00% GC)
  median time:      1.373 ms (0.00% GC)
  mean time:        1.376 ms (0.00% GC)
  maximum time:     1.605 ms (0.00% GC)
  --------------
  samples:          3501
  evals/sample:     1

#7

I filled an issue #28143. Thanks for the proposed solution, I’ll probably use it for now.


#8

Should be fixed by https://github.com/JuliaLang/julia/pull/28146. Would be interesting to see if it solves the original, non-MWE, code.


#9

I think this fixed the non-MWE code as well. Right now I just redefined the corresponding functions with Base.copyto!... in the REPL. Results from original code below, reading a 340002 x 340002 sparse matrix.

Without fix:

@time main() # second run
  112.393334 seconds (4.35 M allocations: 126.643 MiB, 0.07% gc time)

With fix:

@time main() # second run
  0.718545 seconds (4.35 M allocations: 126.643 MiB, 7.83% gc time)

With append2! from above:

@time main() # second run
  0.815360 seconds (4.36 M allocations: 126.860 MiB, 7.74% gc time)