It doesn’t constrain it. It generates inefficient code, that looks up the value at runtime. Something like turning f(di) = sin(di[:a][1]) called with a Dict{Symbol, Vector} into f(di) = if di[:a][1] isa Int; sin_of_int(di[:a][1]); elseif di[:a][1] isa Float64; sin_of_Float64(di[:a][1]) else ...
*reality is more complicated than that, but that’s the gist. It’s way, way less efficient code than if the Vector is concrete.
I do not work with the compiler in any capacity, but I can at least answer some of the questions you clarified and help you peek into the LLVM a bit. It’s still not very clear to me what you know or don’t know, so you’ll have to start overlooking any unexpected redundancies.
The simple answer is it does not constrain Dict{Symbol, Vector} any more because it is a concrete type with instances. That is also a red herring because you’re concerned about the abstract type parameterVector and all the Any showing up in type inference.
Type inference is what the compiler can figure out in advance of any calls on any instances. Let’s take a simpler example Some, the simplest non-empty parametric struct possible. It’s definition is just struct Some{T} value::T end, and the API for accessing its value field is the function something. Some{Any} is a particular concrete type whose field can hold any instance: 2, 3im, "four", Vector[[5], []], you name it. Given that type, it’s clear that something can also return any instance, therefore the call signature something(::Some{Any}) can only have an inferred return type of Any. In more steps, that’s also why you see so many Any. If you index Dict{Symbol, Vector}, you get one of an infinite number of concrete subtypes of the abstract Vector, and indexing any of those subtypes can result in any instance.
Okay, the compiler can’t narrow down to concrete types sometimes, that’s understandable. But how does that make things slow? People have been trying to answer this, but maybe it’ll help if we look right at the code. First we look at something fast:
julia> plus1(x::Some) = something(x)+1
plus1 (generic function with 1 method)
julia> @code_llvm plus1(Some{Int}(2))
; Function Signature: plus1(Base.Some{Int64})
; @ REPL[1]:1 within `plus1`
; Function Attrs: uwtable
define i64 @julia_plus1_963(ptr nocapture noundef nonnull readonly align 8 dereferenceable(8) %"x::Some") #0 {
top:
; ┌ @ int.jl:87 within `+`
%"x::Some.value_ptr.unbox" = load i64, ptr %"x::Some", align 8
%0 = add i64 %"x::Some.value_ptr.unbox", 1
ret i64 %0
; └
}
Despite your (and my own) inexperience with LLVM, you can probably see that there are only 2 instructions before returning: load i64 from the Some{Int} and add i64 to 1. The compiler figured out all the types and methods from the concrete inputs, and we were rewarded with efficient code for a +(::Int64, ::Int64) call. Now let’s use Some{Any} for slow code:
Oof that is much more of an eyesore, and I chose Some specifically to simplify it as much as possible. Let’s break it down:
stack-allocates a “callframe” with a couple pointers (ptr)
something(x) loads the data from the x::Some{Any} object’s value field. We’re dealing with a pointer at that field because instances vary in size, so the actual data is stored elsewhere on the heap.
store the loaded data into the call frame
store "jl_global#933.jit" into the call frame. It’s normal not to know what that even is, it’s an internal detail.
call @ijl_apply_generic on that call frame and return the result
You’ll notice that the compiled code clearly computed the something(x) part, but the +1 part (the entire point of the function itself) is entirely missing. Instead we prepare a call frame and call @ijl_apply_generic. Previous commenters were entirely correct, the compiler is lazy and does not do the +1 part here. Of course it can’t, like you’ve pointed out, there are an infinite number of possible concrete types to add 1 to, we can’t eagerly compile all the operations. Instead, the compiled code defers that work to another method dispatch at runtime. I’ll quote the docs’ own example:
Given the call f(x, y), the following steps are performed: first, the method table to use is accessed as typeof(f).name.mt. Second, an argument tuple type is formed, Tuple{typeof(f), typeof(x), typeof(y)}. Note that the type of the function itself is the first element. This is because the type might have parameters, and so needs to take part in dispatch. This tuple type is looked up in the method table.
This dispatch process is performed by jl_apply_generic, which takes two arguments: a pointer to an array of the values f, x, and y, and the number of values (in this case 3).
In our own plus1(Some{Any}(2)) example, the runtime dispatch finds Tuple{typeof(+), ::Int64, ::Int64}, which you’ll notice we’ve already dispatched to and compiled in the earlier plus1(Some{Int}(2)). In other calls, like plus1(Some{Any}(3im)), the same compiled code for plus1(::Some{Any}) has to find other + methods, potentially having to compile them as well. Method dispatch takes time*, which is why we try fairly hard to maintain type stability in hot code to let our compiler do that work at compile-time instead. Another reason for slowdowns was slightly mentioned earlier; an instance of an unknown type has an unknown size and can’t be planned by the compiler for inline or stack allocation, and accessing the heap through a pointer takes extra work for a whole host of reasons.
*14-15ns is not bad once per μs, but pretty awful before each 2ns integer addition.
This is the comparable process in CPython. While CPython implements single dispatch at runtime, Julia has multiple dispatch that may occur at compile-time or runtime depending on how well type inference goes. It’s worth mentioning that there are implementations of Python (or subsets and supersets) where a compiler is around to do method dispatch at compile-time, though there are often practical constraints from compliance to the Python language standard.
There is a simple explanation for this. If we think about what a Vector is in terms of its in memory representation. I would guess it is something very similar to the following. (You obviously correct me if I am wrong.)
struct{T}
size::Int64
capacity::Int64
data::Ptr{T}
end
To explain, this is a templated type with a pointer to the first element of data, which is heap allocated, and two variables to hold the number of elements and size of allocated memory.
So if you think what T would be for each of the Vectors in
Vector{Vector}
the second T is unspecified. The first T is Vector, which has a fixed memory size.
The reason it has a fixed memory size is that data::Ptr{T} is a pointer. And all pointers are 64 bits long. (Usually.) So Vector has a fixed length.
This does raise a question though. And this is where my knowledge runs out.
When a get_element_by_index function is called on the first (outer) Vector (sorry not sure of the exact Julia function name for that) what does it return?
It would need to return a copy of that element. Which would be of type
struct
size::Int64
capacity::Int64
data::Ptr
end
The question is then, how do you call get_element_by_index on that return value? Since we do not know what T is.
If I wanted to learn more of the Julia version of LLVM, so that I can better understand the output you have shown here, how would I go about doing that? (I assume there are no books on this, for example? But maybe there are some online documentation pages or something?)
This is really for my own curiosity. It might be interesting to see it.
We are all expecting to see a some stuff which dispatches to an add intrinsic function, but I think I’m correct in saying right now the LLVM code hasn’t figured out the type of x::Some by this point? So there should be some more work to do there.
I do not know LLVM IR well enough to know, though there are evidently people that do. I do have ideas that I hinted at before. LLVM IR documentation can be found at llvm.org, and that’s what I use to understand some of the instructions. Julia’s developer documentation explains some of the implementation (note that this contrasts with the main documentation that mostly sticks to the language specification instead), you can start here. Other things from Julia’s front-end to LLVM (probably what you mean by “Julia version of LLVM”) like the various mangled jl-prefixed names in the LLVM IR might require digging into the source code (Github has decent text search), if they’re not runtime-generated names to begin with.
Thankfully we don’t often need to dig into the LLVM IR to find issues with performance, for example the inferred ::Any is a really good hint that runtime dispatch soon follows, and packages like JET.jl, Cthulhu.jl, AllocCheck.jl, etc report runtime dispatches, the users firmly on the Julia side. However, understanding LLVM does provide insight into compiler optimizations below the Julia level.
I guess, if T is an abstract type, then the layout is more like:
struct{T}
size::Int64
capacity::Int64
data::Ptr{Box}
end
where Box is something like
struct Box
type_tag::Int
size::Int
data::Ptr{Cvoid}
end
I.e., the data is an array of raw pointers, which reference the boxed data. On getindex, the compiler automatically inserts instructions to dereference the pointer, unbox the actual data and then proceed. But the thing is, the compiler does not know what the unboxed data would be, so that all the operations with that data in the same scope (i.e., without a function barrier) must be accompanied by runtime boxing/unboxing, and this is where the performance penalty comes from.
EDIT: Changed Ptr{Ptr{Cvoid}} to Ptr{Box} in the simplified example.
Probably.
With unsafe_* functions from C Interface · The Julia Language it must be possible to inspect which data is actually stored and analyze the layout.
Exactly.
But please note that my explanation is only my mental model and glosses over the actual details. And that, as @mkitti noted, the actual details are version-dependent.
julia> x = Vector[[1.0]]
1-element Vector{Vector}:
[1.0]
julia> p = pointer(x)
Ptr{Vector} @0x00007f78e85f79b8 # where data actually starts
julia> p = pointer_from_objref(x)
Ptr{Nothing} @0x00007f78e85f7990 # where x starts as an object (including metadata)
julia> up = convert(Ptr{UInt}, p)
Ptr{UInt64} @0x00007f78e85f79b8
julia> unsafe_load(up)
0x00007f78e864d0d0
julia> pointer_from_objref(x[1])
Ptr{Nothing} @0x00007f78e864d0d0 # same as previous!
It looks like Vector{Vector} in this case holds pointers to heap-allocated Julia objects.
Something which is maybe a little confusing about types in Julia is they appear to serve two purposes.
The types of data, meaning specifications for memory layouts. Only concrete types are in this category
Invisible (imaginary?) tags used by the compiler
This is quite different from other languages.
It’s a bit tricky to explain what I mean by this. If we refer back to my original post
function demo(
dict::Dict{Symbol, Vector},
datetime_first::DateTime,
datetime_last::DateTime,
)
time = dict[:time]
data = dict[:data]
println(typeof(time))
println(typeof(data))
end
The println statements here will print concrete types. typeof tells you about memory layouts.
However, running @code_warntype produces warnings such as
Locals
time::Vector
and
%3 = time::Vector
This is a warning that the compiler is producing code where time is deduced as time::Vector, which is an abstract type.
So there are at least two purposes to a type. Or at least two meanings of the name of a type when printed by Julia.
It is different from C which doesn’t have any abstract types, but it’s similar to most nominal type systems aside from being significantly more expressive with parametric types.
Though beware that dump for objects will display concrete types of the field values even if the field is actually annotated with an abstract type, as seen in dump for the type.
Slightly off the topic, is there really no ready-made reflection tools for a struct’s internal layout in bytes? fieldoffset documents a derived function that’s almost there, I changed it to printing Int instead of UInt because I’m not used to hexadecimal: