Should I specify a concrete eltype for a container

I’ve been writing a lot of julia code, legitimately. Yet still have an important question.

Is my following intuition correct?

Instead of writing

v = Point2d[]
push!(v, Point2d(1, 2))
push!(v, Point2d(3, 4))

, it’s better off writing

v = []
push!(v, Point2d(1, 2))
push!(v, Point2d(3, 4))

, provided that I implicitly require myself to not push! objects of other type. The latter style is not only more concise but also more performant

Assuming Point2d is a concrete type, your intuition is wrong. It is always better to have a concrete element type. And it will be more performant.

If Point2d is an abstract type, then, it depends. Your latter is equivalent to:

v = Any[]
3 Likes

How to explain these?


My intuition:

  1. If I just write v = [], then it just pushes.
  2. If I had specified v = Point2d[], then it has a push action followed by a type check. The additional action takes additional time.

@odow is probably right:

import JuMP
using BenchmarkTools
import Random

const CON = JuMP.@build_constraint(0 <= 1);
const N = 10000
const K = 5

function concrete(CON, N, K)
    Random.seed!(1)
    v = Vector{typeof(CON)}(undef, 0)
    for _ = 1:K
        for _ = 1:N
            push!(v, JuMP.@build_constraint(rand(Int) <= rand(Int)))
        end
        empty!(v)
    end
end

function abstract(CON, N, K)
    Random.seed!(1)
    v = Vector{Any}(undef, 0)
    for _ = 1:K
        for _ = 1:N
            push!(v, JuMP.@build_constraint(rand(Int) <= rand(Int)))
        end
        empty!(v)
    end
end

@btime abstract($CON, $N, $K)
@btime concrete($CON, $N, $K)

julia> @btime abstract($CON, $N, $K)
  459.900 μs (50022 allocations: 1.72 MiB)

julia> @btime concrete($CON, $N, $K)
  188.300 μs (23 allocations: 394.26 KiB)

But now I want a explanation, otherwise the doc is considered misleading.

What part of the doc is misleading?

Your intuition is good and would be correct if the code were fully interpreted. However Julia tries to compile and optimise code where it can, which means:

  1. [] and Point2d[] have different data layouts, so “just pushing” is different. In the latter case it really does just push the value directly. In the former case it needs to “box” the value first, meaning it stores the value alongside its type, which is relatively slow because it costs an allocation.
  2. The type check you mentioned does happen semantically, but in practice it is checked and elided at compile-time and so has no runtime cost.
  3. Unmentioned are the future costs. Any future code that accesses the elements of the [] does not know the type of the elements, which is why boxing was necessary. This leads to poorly inferred code and therefore lots of runtime dispatch. Whereas with Point2d[], future code will infer better and dispatch can be performed at compile time, again with zero runtime cost.
6 Likes

These are irrelevant. The first link is about generic ways to compute concrete types instead of hardcoding every path, and the second link is about preferring simpler abstract element types instead of complicated ones. Neither involves or contradicts concrete element types having better performance than abstract element types.

Not so, the type check comes first. If the type doesn’t match, then a convert step is attempted. If it didn’t outright error, then the type is checked again. If it matches the container eltype, only then is growing the vector and setting the last element possible.

If the expression for the element is inferred by the compiler to have a narrower type. Otherwise the compiler will leave that work until runtime.

1 Like

In the Performance Tips, we can find

If you cannot avoid containers with abstract value types, it is sometimes better to parametrize with Any to avoid runtime type checking. E.g. IdDict{Any, Any} performs better than IdDict{Type, Vector}
Performance Tips · The Julia Language

I don’t know about the Point2d type, but if it’s parametrized, e.g. with Point2d{Int}(1,2) or Point2d{Float64}(1.0, 2.0), the unparametrized Point2d is an abstract type. You may then consider something like:

v = Point2d{Int}[]
push!(v, Point2d(1, 2))
push!(v, Point2d(3, 4))

which will not end up with runtime type checking if the type of the coordinates is known at compile time.

Also, note that in your JuMP example, the empty!(v) for a bits typed vector v is constant time and very fast (it’s basically just v.size = (0,)), whereas empty!(v) for an abstract type v is slow (it loops through the entire vector to do an unsetindex! to avoid dangling references to the deleted objects). (see e.g. `resize!` is O(N) when downsizing. · Issue #56359 · JuliaLang/julia · GitHub).

Here you can see the difference:

julia> v = [1,2,3,4];
julia> empty!(v);
julia> v.ref.mem  # the backing memory for the vector, unchanged by empty!
4-element Memory{Int64}:
 1
 2
 3
 4

julia> v = Any[1,2,3,4];
julia> empty!(v);
julia> v.ref.mem
4-element Memory{Any}:
 #undef
 #undef
 #undef
 #undef
3 Likes

The others already answered the main question:

Julia is a fully dynamic language. This means that there are no static types in julia, and the type of x is always the same as typeof(x).

This is as opposed to e.g. java, where ArrayList<String> x = new ArrayList<String>(); still leaves you with x.getClass() being ArrayList<>; or, in other words, java types exist only on the source-code / compile-time level, while compiled code / runtime objects only have classes, and a typeof runtime function cannot exist (because the types have been erased).

Hence, Vector{Point2d} and Vector{Any} are genuinely different types, and can have genuinely different memory layout, and functions can dispatch on the type parameter.

Your options for the eltype of your Vector are:

  1. Concrete type, or small union of concrete types – this is fastest!
  2. Any – If you cannot make the type concrete or small union of concrete types, then this is normally the next fastest variant.
  3. Abstractly typed – stuff like Vector{Real} or Vector{Vector}. In very rare instances this can perform better than Vector{Any}, but consensus is that this is a trap. Nevertheless you may sometimes need this, especially if you then have to dispatch on the type parameter (i.e. you have foo(x::Vector{Vector}) and foo(x::Vector{T}) where T<:Real). This is a code smell, though.

The more interesting question is how to achieve this in your example. There are two reasonable ways. You can be explicit:

v = Point2d{Int}[]
push!(v, Point2d(a,b))
#equivalent:
v = Point2d{Int}[Point2d(a,b)]

Or you can let julia’s base vector concat figure it out:

v = [Point2d(a,b)]

This is equivalent to

point = Point2d(a,b)
v = typeof(point)[point]

The difference will become apparent if you ever want to support points with Int16 or Int128 coordinates, or run automatic differentiation.

However, writing code that seamlessly supports AD or different integer sizes or Float precisions is a little bit of an art, with lots of pitfalls.

Depending on your code, you can often double (or more, if you’re cache-size bound!) your performance by dropping to Int32 and Float32, but you need to be very careful about literals (because in julia, literals are typed. In other words, x = Int32(0); x += 1; typeof(x) is Int64. This is a very big pitfall compared to java or C, where rvalue literals try to adopt the type of the lvalue. The julia way is very elegant in terms of language simplicity, but it makes writing some kinds of code a total pain).

2 Likes