How does Any actually work?

It doesn’t constrain it. It generates inefficient code, that looks up the value at runtime. Something like turning f(di) = sin(di[:a][1]) called with a Dict{Symbol, Vector} into f(di) = if di[:a][1] isa Int; sin_of_int(di[:a][1]); elseif di[:a][1] isa Float64; sin_of_Float64(di[:a][1]) else ...

*reality is more complicated than that, but that’s the gist. It’s way, way less efficient code than if the Vector is concrete.

1 Like

I do not work with the compiler in any capacity, but I can at least answer some of the questions you clarified and help you peek into the LLVM a bit. It’s still not very clear to me what you know or don’t know, so you’ll have to start overlooking any unexpected redundancies.

The simple answer is it does not constrain Dict{Symbol, Vector} any more because it is a concrete type with instances. That is also a red herring because you’re concerned about the abstract type parameter Vector and all the Any showing up in type inference.

Type inference is what the compiler can figure out in advance of any calls on any instances. Let’s take a simpler example Some, the simplest non-empty parametric struct possible. It’s definition is just struct Some{T} value::T end, and the API for accessing its value field is the function something. Some{Any} is a particular concrete type whose field can hold any instance: 2, 3im, "four", Vector[[5], []], you name it. Given that type, it’s clear that something can also return any instance, therefore the call signature something(::Some{Any}) can only have an inferred return type of Any. In more steps, that’s also why you see so many Any. If you index Dict{Symbol, Vector}, you get one of an infinite number of concrete subtypes of the abstract Vector, and indexing any of those subtypes can result in any instance.

Okay, the compiler can’t narrow down to concrete types sometimes, that’s understandable. But how does that make things slow? People have been trying to answer this, but maybe it’ll help if we look right at the code. First we look at something fast:

julia> plus1(x::Some) = something(x)+1
plus1 (generic function with 1 method)

julia> @code_llvm plus1(Some{Int}(2))
; Function Signature: plus1(Base.Some{Int64})
;  @ REPL[1]:1 within `plus1`
; Function Attrs: uwtable
define i64 @julia_plus1_963(ptr nocapture noundef nonnull readonly align 8 dereferenceable(8) %"x::Some") #0 {
top:
; ┌ @ int.jl:87 within `+`
   %"x::Some.value_ptr.unbox" = load i64, ptr %"x::Some", align 8
   %0 = add i64 %"x::Some.value_ptr.unbox", 1
   ret i64 %0
; └
}

Despite your (and my own) inexperience with LLVM, you can probably see that there are only 2 instructions before returning: load i64 from the Some{Int} and add i64 to 1. The compiler figured out all the types and methods from the concrete inputs, and we were rewarded with efficient code for a +(::Int64, ::Int64) call. Now let’s use Some{Any} for slow code:

julia> @code_llvm plus1(Some{Any}(2))
; Function Signature: plus1(Base.Some{Any})
;  @ REPL[3]:1 within `plus1`
; Function Attrs: uwtable
define nonnull ptr @julia_plus1_928(ptr nocapture noundef nonnull readonly align 8 dereferenceable(8) %"x::Some") #0 {
top:
  %jlcallframe1 = alloca [2 x ptr], align 8
; ┌ @ some.jl:103 within `something`
; │┌ @ Base.jl:49 within `getproperty`
    %"x::Some.value" = load atomic ptr, ptr %"x::Some" unordered, align 8
; └└
  store ptr %"x::Some.value", ptr %jlcallframe1, align 8
  %0 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 1
  store ptr @"jl_global#933.jit", ptr %0, align 8
  %1 = call nonnull ptr @ijl_apply_generic(ptr nonnull @"jl_global#932.jit", ptr nonnull %jlcallframe1, i32 2)
  ret ptr %1
}

Oof that is much more of an eyesore, and I chose Some specifically to simplify it as much as possible. Let’s break it down:

  • stack-allocates a “callframe” with a couple pointers (ptr)
  • something(x) loads the data from the x::Some{Any} object’s value field. We’re dealing with a pointer at that field because instances vary in size, so the actual data is stored elsewhere on the heap.
  • store the loaded data into the call frame
  • store "jl_global#933.jit" into the call frame. It’s normal not to know what that even is, it’s an internal detail.
  • call @ijl_apply_generic on that call frame and return the result

You’ll notice that the compiled code clearly computed the something(x) part, but the +1 part (the entire point of the function itself) is entirely missing. Instead we prepare a call frame and call @ijl_apply_generic. Previous commenters were entirely correct, the compiler is lazy and does not do the +1 part here. Of course it can’t, like you’ve pointed out, there are an infinite number of possible concrete types to add 1 to, we can’t eagerly compile all the operations. Instead, the compiled code defers that work to another method dispatch at runtime. I’ll quote the docs’ own example:

Given the call f(x, y), the following steps are performed: first, the method table to use is accessed as typeof(f).name.mt. Second, an argument tuple type is formed, Tuple{typeof(f), typeof(x), typeof(y)}. Note that the type of the function itself is the first element. This is because the type might have parameters, and so needs to take part in dispatch. This tuple type is looked up in the method table.
This dispatch process is performed by jl_apply_generic, which takes two arguments: a pointer to an array of the values f, x, and y, and the number of values (in this case 3).

In our own plus1(Some{Any}(2)) example, the runtime dispatch finds Tuple{typeof(+), ::Int64, ::Int64}, which you’ll notice we’ve already dispatched to and compiled in the earlier plus1(Some{Int}(2)). In other calls, like plus1(Some{Any}(3im)), the same compiled code for plus1(::Some{Any}) has to find other + methods, potentially having to compile them as well. Method dispatch takes time*, which is why we try fairly hard to maintain type stability in hot code to let our compiler do that work at compile-time instead. Another reason for slowdowns was slightly mentioned earlier; an instance of an unknown type has an unknown size and can’t be planned by the compiler for inline or stack allocation, and accessing the heap through a pointer takes extra work for a whole host of reasons.

julia> using BenchmarkTools

julia> @btime plus1($(Some{Int}(2)))
  2.000 ns (0 allocations: 0 bytes)
3

julia> @btime plus1($(Some{Any}(2)))
  16.633 ns (0 allocations: 0 bytes)
3

*14-15ns is not bad once per μs, but pretty awful before each 2ns integer addition.

This is the comparable process in CPython. While CPython implements single dispatch at runtime, Julia has multiple dispatch that may occur at compile-time or runtime depending on how well type inference goes. It’s worth mentioning that there are implementations of Python (or subsets and supersets) where a compiler is around to do method dispatch at compile-time, though there are often practical constraints from compliance to the Python language standard.

5 Likes

There is a simple explanation for this. If we think about what a Vector is in terms of its in memory representation. I would guess it is something very similar to the following. (You obviously correct me if I am wrong.)

struct{T}
    size::Int64
    capacity::Int64
    data::Ptr{T}
end

To explain, this is a templated type with a pointer to the first element of data, which is heap allocated, and two variables to hold the number of elements and size of allocated memory.

So if you think what T would be for each of the Vectors in

Vector{Vector}

the second T is unspecified. The first T is Vector, which has a fixed memory size.

The reason it has a fixed memory size is that data::Ptr{T} is a pointer. And all pointers are 64 bits long. (Usually.) So Vector has a fixed length.

This does raise a question though. And this is where my knowledge runs out.

When a get_element_by_index function is called on the first (outer) Vector (sorry not sure of the exact Julia function name for that) what does it return?

It would need to return a copy of that element. Which would be of type

struct
    size::Int64
    capacity::Int64
    data::Ptr
end

The question is then, how do you call get_element_by_index on that return value? Since we do not know what T is.

Surely that is an infinite list of possibilities?

This is a really good explanation, thank you.

I have a further question.

  • If I wanted to learn more of the Julia version of LLVM, so that I can better understand the output you have shown here, how would I go about doing that? (I assume there are no books on this, for example? But maybe there are some online documentation pages or something?)

This is not really that relevant at this point, but is there any way to “drill down” into

call nonnull ptr @ijl_apply_generic(ptr nonnull @"jl_global#2936.jit", ptr nonnull %jlcallframe1, i32 2)

and see what that does?

This is really for my own curiosity. It might be interesting to see it.

We are all expecting to see a some stuff which dispatches to an add intrinsic function, but I think I’m correct in saying right now the LLVM code hasn’t figured out the type of x::Some by this point? So there should be some more work to do there.

I do not know LLVM IR well enough to know, though there are evidently people that do. I do have ideas that I hinted at before. LLVM IR documentation can be found at llvm.org, and that’s what I use to understand some of the instructions. Julia’s developer documentation explains some of the implementation (note that this contrasts with the main documentation that mostly sticks to the language specification instead), you can start here. Other things from Julia’s front-end to LLVM (probably what you mean by “Julia version of LLVM”) like the various mangled jl-prefixed names in the LLVM IR might require digging into the source code (Github has decent text search), if they’re not runtime-generated names to begin with.

Thankfully we don’t often need to dig into the LLVM IR to find issues with performance, for example the inferred ::Any is a really good hint that runtime dispatch soon follows, and packages like JET.jl, Cthulhu.jl, AllocCheck.jl, etc report runtime dispatches, the users firmly on the Julia side. However, understanding LLVM does provide insight into compiler optimizations below the Julia level.

1 Like

I guess, if T is an abstract type, then the layout is more like:

struct{T}
    size::Int64
    capacity::Int64
    data::Ptr{Box}
end

where Box is something like

struct Box
    type_tag::Int
    size::Int
    data::Ptr{Cvoid}
end

I.e., the data is an array of raw pointers, which reference the boxed data. On getindex, the compiler automatically inserts instructions to dereference the pointer, unbox the actual data and then proceed. But the thing is, the compiler does not know what the unboxed data would be, so that all the operations with that data in the same scope (i.e., without a function barrier) must be accompanied by runtime boxing/unboxing, and this is where the performance penalty comes from.

EDIT: Changed Ptr{Ptr{Cvoid}} to Ptr{Box} in the simplified example.

1 Like

You raise an interesting point. If I understand you correctly:

  • Vector is an abstract type
  • The memory representation should be
struct{T}
    size::Int64
    capacity::Int64
    data::Ptr{Ptr{Cvoid}}
end

Is there a way we can test this somehow?

See I wonder how this fits in with the function with a dictionary argument

function untitled(dict::Dict{:Symbol, Vector})

Maybe we can try figure out what the memory representation of that is?

struct untitled
    size::Int64
    capacity::Int64
    keys::Vector{Symbol}
    values::Vector # Vector is abstract
end

So every time get_index/set_index etc is called on values that will be some dynamic dispatch, essentially.

Probably.
With unsafe_* functions from C Interface · The Julia Language it must be possible to inspect which data is actually stored and analyze the layout.

Could we use that to read a block of data from memory starting at some offset?

The problem is, how would we generate a Vector? (Abstract thing.)

The below example isn’t going to work. The type of x and y are both concrete. It is not possible to create (instantiate) an abstract type.

x::Vector = Float32[1.0]
y = convert(Vector, x)

So we would have to use a function to get the behavior we want or something?

function untitled(x::Vector)

end

I was being stupid. What you suggest is more like this:

julia> x::Vector{Vector} = [[1.0]]

julia> typeof(x)
Vector{Vector}

julia> typeof(x[1])
Vector{Float64}

So then we would need a way to inspect the memory of x, and hope this gives useful hints.

[quote=“world-peace, post:31, topic:124135”]
Could we use that to read a block of data from memory starting at some offset?
[/quote

Yes. Here is a demo.

julia> A = [1,2,3]
3-element Vector{Int64}:
 1
 2
 3

julia> ptr = pointer(A)
Ptr{Int64} @0x0000007b60d30400

julia> unsafe_load(ptr)
1

julia> unsafe_load(ptr, 1)
1

julia> unsafe_load(ptr, 2)
2

julia> unsafe_load(ptr, 3)
3

julia> unsafe_load(ptr, 0)
529753220656

julia> unsafe_load(ptr, -1)
3

julia> unsafe_load(ptr, -2)
3

A word of caution. These details have changed in the past.

1 Like

Exactly.
But please note that my explanation is only my mental model and glosses over the actual details. And that, as @mkitti noted, the actual details are version-dependent.

To be clear, we do not need to reverse engineer this. The source code is a available:

1 Like

Hmm, trying the same thing on a Vector{Vector} does not seem to work.

julia> v=[[1,2,3],["1","2","3"]]
2-element Vector{Vector}:
 [1, 2, 3]
 ["1", "2", "3"]

julia> p=pointer(v)
Ptr{Vector} @0x00007f031ddc65d0

julia> unsafe_load(p)
ERROR: pointerref: invalid pointer type
Stacktrace:
 [1] unsafe_load
   @ ./pointer.jl:153 [inlined]
 [2] unsafe_load(p::Ptr{Vector})
   @ Base ./pointer.jl:153
 [3] top-level scope
   @ REPL[7]:1
julia> x = Vector[[1.0]]
1-element Vector{Vector}:
 [1.0]

julia> p = pointer(x)
Ptr{Vector} @0x00007f78e85f79b8 # where data actually starts

julia> p = pointer_from_objref(x)
Ptr{Nothing} @0x00007f78e85f7990 # where x starts as an object (including metadata)

julia> up = convert(Ptr{UInt}, p)
Ptr{UInt64} @0x00007f78e85f79b8

julia> unsafe_load(up)
0x00007f78e864d0d0

julia> pointer_from_objref(x[1])
Ptr{Nothing} @0x00007f78e864d0d0 # same as previous!

It looks like Vector{Vector} in this case holds pointers to heap-allocated Julia objects.

Something which is maybe a little confusing about types in Julia is they appear to serve two purposes.

  • The types of data, meaning specifications for memory layouts. Only concrete types are in this category
  • Invisible (imaginary?) tags used by the compiler

This is quite different from other languages.

It’s a bit tricky to explain what I mean by this. If we refer back to my original post

function demo(
    dict::Dict{Symbol, Vector},
    datetime_first::DateTime,
    datetime_last::DateTime,
)
    time = dict[:time]
    data = dict[:data]

    println(typeof(time))
    println(typeof(data))
end

The println statements here will print concrete types. typeof tells you about memory layouts.

However, running @code_warntype produces warnings such as

Locals
  time::Vector

and

%3 = time::Vector

This is a warning that the compiler is producing code where time is deduced as time::Vector, which is an abstract type.

So there are at least two purposes to a type. Or at least two meanings of the name of a type when printed by Julia.

In what way is this different from other languages?

Or, perhaps another way, what languages is this different from?

1 Like

It is different from C which doesn’t have any abstract types, but it’s similar to most nominal type systems aside from being significantly more expressive with parametric types.

That looks a lot like:

julia> dump(Dict{Symbol, Vector})
Dict{Symbol, Vector} <: AbstractDict{Symbol, Vector}
  slots::Memory{UInt8}
  keys::Memory{Symbol}
  vals::Memory{Vector}
  ndel::Int64
  count::Int64
  age::UInt64
  idxfloor::Int64
  maxprobe::Int64

Though beware that dump for objects will display concrete types of the field values even if the field is actually annotated with an abstract type, as seen in dump for the type.

julia> dump(Some{Any})
Some{Any} <: Any
  value::Any

julia> dump(Some{Any}(2))
Some{Any}
  value: Int64 2

Slightly off the topic, is there really no ready-made reflection tools for a struct’s internal layout in bytes? fieldoffset documents a derived function that’s almost there, I changed it to printing Int instead of UInt because I’m not used to hexadecimal:

julia> structinfo(T) = [(Int(fieldoffset(T,i)), fieldname(T,i), fieldtype(T,i)) for i = 1:fieldcount(T)];

julia> structinfo(Dict{Symbol, Vector})
8-element Vector{Tuple{Int64, Symbol, DataType}}:
 (0, :slots, Memory{UInt8})
 (8, :keys, Memory{Symbol})
 (16, :vals, Memory{Vector})
 (24, :ndel, Int64)
 (32, :count, Int64)
 (40, :age, UInt64)
 (48, :idxfloor, Int64)
 (56, :maxprobe, Int64)

Just would be nice if it also highlighted pointers/references (the 8-byte fields with heap-allocated or abstract types).