Is there a tutorial on advanced use of types in Julia? The manual isn’t enough for me to read advanced abstract type usage in actual packages.
This one on type-dispatch designs might be helpful:
Thanks for the link. Until reading that blog, I did not appreciate that type-checking within a function would get compiled away. (The optimization does not hit until @code_llvm, and I’d been spending time in @code_warntype mostly.)
So newbie question - given that it seems like both structures below will compile to the same point, which is considered the more Julian style?
function action(x::TypeA)
[work on a TypeA]
end
function action(x::TypeB)
[work on a TypeB]
end
or
function action(x)
if x isa TypeA
[work on a TypeA]
elseif x isa TypeB # ignoring the wrong-type case to keep things short
[work on a TypeB]
end
end
I’ve been using the former, in part because it is self-documenting. But I’m curious about the community’s opinion. (I also now suspect I have been over-typing - I need to take a second look at my code…)
Thanks.
The former: We generally use dispatch rather than explicit type-checks.
(If you grep the Base
code for " isa "
you’ll find a few exceptions to this rule, mainly in processing syntax trees and other reflection code. Basically you use isa
in cases where almost all of the code is the same for different types, so that you only need to tweak the behavior for a particular type, and it doesn’t seem worth the trouble to refactor.)
The former also has the nice feature that, because the checking is done by dispatch, anyone can add new dispatches to further specialize the behavior in new ways. If you do if statements, the only way to add new branches is to change the function itself. This form of “extendability from outside” is often described as the multiple dispatch solution to the expression problem
In the case of handling ASTs (e.g., in Julia’s compiler internals), it’s less about “trouble to refactor” and more about performance. For code that is unavoidably non-inferrable (like most AST-handling code), the big cost is runtime method lookup. When types can’t be inferred, there are nevertheless two cases where the lookup can still be done at compile time:
- when a function has only one method
- inside the body of an
isa
conditional block, where the type is “locally” known
isa
blocks contribute to both of these, since you can use them as a poor substitute for dispatch.
The obvious (and major) downside is that this is non-extensible without editing the original source code, so it’s best reserved for private interfaces.
I have two versions of a function that deals with two different types of inputs, and they differ only in a few crucial lines. It would be very easy to use isa
conditional block to write it, and it would avoid repeating the code, but it seems you would recommend keeping theme two separate functions and repeat the code?
I don’t think that anyone was advocating repeating code explicitly.
It is hard to give specific advice without concrete code, but I would first factor out the common parts to small pieces, make them @inline
if they don’t, then use isa
branches only when absolutely necessary.
Again, this is probably unnecessary micro-optimization for 90% of real-life code. And of course the remaining 10% soaks up 90% of the developer and runtime.
BTW, this has little to do with the advanced usage of the type system per se, it is more about working around limitations of inference.
Something to keep in mind when coming to Julia from a language like Python is that Julia has very little function call overhead as long as the compiler is able to figure out in advance what the types will be. It’s perfectly fine to write lots of small functions instead of one big one (which might require un-learning some Python habits.) You will frequently see code like this:
function f(x)
while in_hot_loop
g(y)
end
end
g(y::TypeA) = something
g(y::TypeB) = something_else
So isa conditional blocks are like smart casts in Kotlin, and the compiler can actually use the type information from them?
In that case, as a microoptimization stategy, does that mean it can be worth it to sprinkle something like
@regardlessof (x isa CommonType) foo(x,otherargs...)
instead of a plain call to foo(x,otherargs…) , where the convenience macro @regardlessof(cond,expr) desugars to cond ? expr : expr .
Would it speed up code in the case where one dispatch is overwhelmingly more common?
I tried to make a tutorial introduction here (given at JuliaCon 2019):
Actually, I am coming from a Python-heavy (for task management) and Haskell-lite (for number crunching) background. I am loving that Julia appears poised to handle both fronts, but I have things to unlearn on both sides.
The bit that escaped me until this thread was the fact that typeof, eltype, etc were handle-able at compile time. I now see things like (simple example)
mapperfunc(fn, itr) = mapperfunc(eltype(itr), fn, itr)
function mapperfunc(::Type{T}, fn, itr) where T
...
end
inferring to the correct output type for mapperfunc(x->2x, [1,2])
. (Previously, I would have read eltype
as a purely runtime function.) Very cool. Thanks, all
That’s not necessary or even helpful when types can be inferred, because in that case the dispatch can be “hard wired” during compilation (reducing the cost of dispatch literally to zero). When types can’t be inferred, it still might not be helpful, because the compiler does some of this manually (see union-splitting). However, to prevent compilation from taking unbounded time, in sufficiently complicated cases the compiler will decide not to do union-splitting; only in those cases can this kind of “manual dispatch” be helpful.
Example:
struct A end
struct B end
struct C end
struct D end
struct E end
f(::A) = 1
f(::B) = 2
f(::C) = 3
f(::D) = 4
f(::E) = 5
function dispatch(list)
s = 0
@inbounds for item in list # @inbounds not necessary (it's just to simplify the result of `@code_warntype dispatch(list)`)
if item isa A
s += f(item) # this f gets compile-time dispatched (and inlined, since it's simple)
else
s += f(item)::Int # this call to f is compile-time dispatched if `item` is inferrable, runtime otherwise
end
end
return s
end
lista_inf = [A() for i = 1:10]
lista_any = Any[A() for i = 1:10]
listc_inf = [C() for i = 1:10]
listc_any = Any[C() for i = 1:10]
using BenchmarkTools
@btime dispatch($lista_inf)
@btime dispatch($lista_any)
@btime dispatch($listc_inf)
@btime dispatch($listc_any)
Results:
julia> include("/tmp/dispatch.jl")
2.514 ns (0 allocations: 0 bytes)
16.515 ns (0 allocations: 0 bytes)
2.234 ns (0 allocations: 0 bytes)
101.940 ns (0 allocations: 0 bytes)
30
You can see the performance on non-inferred A
objects is considerably better than the performance on non-inferred C
objects, but that when inference works then none of this matters.
You can convince yourself that there’s a threshold of 5 total (4 runtime-dispatched) methods for this bad performance to kick in. If you didn’t have E
, then automatic union-splitting would make the second f
call pretty much like the first one.
Hmm - now I see I am still a little off. eltype
allows a compile-time inference on an iterable, but how is the proper output inference done for, say, map
? The eltype
of a function, even with an output type, appears to be Any
and typeof(fn)
appears to generate only runtime variables. How does one (or even can one) infer the output type of a function at compile time? I assume given that map
does it, it is possible…
Each function has its own type, so if you call, for example, map(sin, [1,2,3])
, compilation and type inference is performed for map(::typeof(sin), ::Vector{Int})
(and not something like map(::Function, ::Vector{Int})
, which would be too general.)
A function doesn’t have an eltype
per se, but the compiler will figure out that sin
will be called with an Int
each time, and that the return type is always Float64
in that case.
Thanks. When I look at @code_warntype using typeof(fn), I don’t get inferred types, I get var"#321#322" looking things (I think this what you meant by “Each function has its own type”), which I assume won’t infer as the output? The (dumb) test case below shows what I am targeting but of course won’t run:
julia> test(fn, x) = test(fn, x, typeof(fn))
test (generic function with 1 method)
julia> function test(fn, x, ::Type{T}) where T
result = T[]
push!(result, fn(x))
return result
end
test (generic function with 2 methods)
I guess to summarize the question - if I want to pre-allocate memory to store the output of a passed function (not quite what I’m doing above, but you get the idea) and have the outer function return type infer as that storage, how do I go about getting that storage type at compile time? Thanks.
Often something like Vector{typeof(f(zero(eltype(x))))}(undef,N)
will work, but this may result in f(0)
actually being called. Especially if f
has side effects. Also, if f(0)
returns a different type than f(1)
, there is obviously going to be problems.
I usually rely on functions like map
to do the allocation for me. If I want to save allocations, I can use map!
the second time around (with the array returned by map
).
The problem with your example is that you are creating an array that can store the function fn
- not its output. What you probably want to do is simply this.
test(fn, x) = [fn(x)]
For this, the output type (Array{Float64}
) is computed at compile-time.
Like I said, a dumb example. A more complete one would have had the push!
be a threaded loop, dumping the output in result
.
Sounds like it is nontrivial to force what I’m looking for, but it is probably just a question of getting used to the inference model. Thanks for the help.
Total hack, but this seems to work…
julia> fn = x::Int->x/2
#32 (generic function with 1 method)
julia> eltype(typeof(map(fn, [])))
Float64
Seems like there must be an easier way.