I have a situation where I’m processing rowwise data from a JuliaDB table in a reducer. I’m having trouble with type stability, because my function looks like:
function (row, col)
value = row[col]
...do things with value...
end
This is not surprising because Julia doesn’t know at compile time which column y will be and, and therefore what type it is, and therefore it is unstable and causing allocations. But I know that x[y] is an Int64, so I tried annotating the type:
function (row, col)
value::Int64 = row[col]::Int64
...do things with value...
end
but it is still unstable. How can I annotate row[col] to avoid allocations?
Here’s a MWE (if you enclose the whole thing in a function, the allocations go away for both functions, but only because Julia can infer the value and type of t at compile time - which doesn’t solve my overall problem):
t = (abc=1, tba=2)
unstable(x, i) = x[i]
annotated(x, i) = x[i]::Int64
# run once to compile
unstable(t, :abc)
annotated(t, :abc)
println("no type annotation, allocation expected")
@time unstable(t, :abc) # -> 0.000018 seconds (1 allocation: 32 bytes)
println("type annotation, no allocation expected")
@time annotated(t, :abc) # -> 0.000004 seconds (1 allocation: 32 bytes)
You’re right, that was a bad example. Here’s a better one. The critical thing is that which key is to be extracted from the NamedTuple is not known at compile time, and that the NamedTuple has values that are not all the same type. Here’s a better example, with the key read from the command-line arguments:
unstable(x, i) = x[i]
annotated(x, i) = x[i]::Int64
function main()
t = (abc=1, tba=2.0, cd="jgds")
# run once to compile
unstable(t, :abc)
annotated(t, :abc)
ref = Symbol(ARGS[1])
println("no type annotation, allocation expected")
@time unstable(t, ref)
println("type annotation, no allocation expected")
@time annotated(t, ref)
end
main()
Running julia namedtuple_test.jl abc results in:
no type annotation, allocation expected
0.000001 seconds (1 allocation: 32 bytes)
type annotation, no allocation expected
0.000001 seconds (1 allocation: 32 bytes)
The first allocation is expected, because Julia has no way to know what will come out of the NamedTuple, and not all fields are bitstypes, so it is type-unstable. In the second one, I explicitly tell Julia that it’s going to be an Int64, so I would expect there to not be any allocations in that case, but there still is one.
I intervene to ask a question that makes me understand something about how these mechanisms work, rather than suggesting some kind of solution.
If your situation allows it, wouldn’t it be useful to change and use col [row] instead of working on row [col]?
This does not seem to be an effect of benchmarking since if I do something like
function test(nt, n)
for i in 1:1_000_000
nt[n]
end
end
@time test(nt, :a)
# 0.030643 seconds (1.00 M allocations: 45.977 MiB, 5.38% gc time, 14.04% compilation time)
Which seems to suggest the problem exists there too (if I replace nt[n] with nt[:a] it does not allocate much.
You’re not doing anything wrong, in fact you wrote a test perfectly avoiding global variable artifacts on your first try. And yes, you’re on the right thread.
Julia generally specializes a call on the types of the called object and the arguments. For example, f(1, 1.5) specializes on (typeof(f), Int64, Float64). By this logic, getindex/getfield would be unstable for Tuple/NamedTuple/struct because the element types are heterogeneous! For example, (1, 1.5)[1] is an Int64, yet (1, 1.5)[2] is a Float64, a clear type instability. So, the compiler goes the extra mile to maintain performance for these important methods: if the index or field is a constant at compile-time, the compiler will “specialize” on the index/field’s instance. The compiler also does constant propagation for other function calls, but usually to evaluate the call at compile-time when all arguments are constant; instead, I’m talking about when getindex/getfield happens at runtime but the index/field is fixed for type stability.
julia> let # local scope, all variables are local
t = (abc=1, tba=2.0, cd="jgds")
i = :tba # recognized as constant in scope
@time t[i]
end
0.000000 seconds
2.0
If you only take the examples in this thread without global variable artifacts, you’ll notice that whenever the field :abc is a constant in a local scope, there’s no allocation. However, whenever the field is a function argument i or command line argument ARGS[1], the compiler has to treat it as a variable in the scope and may allocate to handle the type instability.