Performance of type-stable fields in structs


#1

Hi,
just a quick question because I do not have this point very clear. That is related to the performance of type-stable against type-unstable fields in structs. For instance consider

mutable struct croco
  x :: Int64
  y :: Float64
  croco() = new()
end

mutable struct krok
  x
  y
  krok() = new()
end

and I want to use both structs to store one Int64 and one Float64. To my eyes, croco seems to be the most specific one which should perform best, but somehow I seem too remember to have read somewhere that this is not necessarily the case, as the when I initialize an instance of krok it will be specialized to the types employed, such that

c1 = croco()
c1.x = 1
c1.y = 3.1416

and

k1 = krok()
k1.x = 1
k1.y = 3.1416

should perform the same in terms of memory, speed in data access, etc.
Does that make sense or not? Probably not as I tend to bypass many of the fine details of the language… and that’s why I ask in the first place :slight_smile:

Thanks for your help,

Ferran.


#2

For both speed and generality you want to use parametric types:

mutable struct krok{X,Y}
  x::X
  y::Y
end

You krok example will have fields of type Any, and will be slow. This example will be identical to your croco example if you pass an int and a float, it should compile exactly the same code.


#3

krok will be a lot more expensive because nearly all uses of the fields will invoke Julia’s multiple dispatch mechanism at runtime. For croco the compiler will usually be able to figure out which function to call on x and y at compile time. This avoids dynamic dispatch, allows inlining and a bunch of further optimizations.


#4

You may be thinking of statements that putting concrete types on function arguments does not help. That is true. On the other hand, using concrete types on locations like field types and array element types is absolutely crucial to performance.


#5

Writting field types when defining a struct can be helpful to avoid “conversion mistakes”.

Notice that c1.x =1. and k1.x =1. will be interpreted differently.
If you don’t specify the type of an argument foo.x then foo.x will be filled with the type used to fill it. If you specify types though Julia is smart enough to convert them to the correct type (in case there is a sensible conversion).

julia> c1 = croco()
croco(0, 0.0)

julia> c1.x = 1.
1.0

julia> typeof(c1.x)
Int64

julia> k1 = krok()
krok(#undef, #undef)

julia> k1.x = 1.
1.0

julia> typeof(k1.x)
Float64

#6

I sometimes wonder if making parametric types the default would make sense. Eg

struct Foo{T}
    a
    b
    c::T
end

would be equivalent to (mock code)

let P1 = gensym(:a),
    P2 = gensym(:b)
    quote
        struct Foo{T,$P1,$P2}
            a::$P1
            b::$P2
            c::T
        end
    end
end

since this is the most common use case.


#7

I was thinking that too. a::Any should have to be explicit.


#8

It would be nice if in this example

c1.x = 1.

would return

1

Not sure if that possible or easy to implement.


#9

Jeff and I did discuss automatically putting hidden type parameters on immutable structs a long time ago. I think it just feels a little too complex and unnecessary since the code works as expected already; when you want the performance you can do that explicitly without changing the usage at all.


#10

Assignments always evaluate to the RHS by design. Otherwise a = b = 1 could lead to a having an arbitrary value based on the type of the variable b. The same applies to a = x.f = 1 but depending on the type of the field f which is even worse since it’s such a very bad case of spooky action at a distance.


#11

This is not a good idea to have by default, because sometimes the extra type information is just not needed and also increases compilation time because Julia will automatically treat each combination of type parameters as a new instance to compile functions for. However, I suppose it would still be possible to use Any as a type declaration to prevent this from happening if it were the default.


#12

I guess both uses cases have their place, I just happen to use parametric types more often, and I usually optimize for runtime (not compile time). Also, most Julia code I read by others has type parameters, but this may not be a representative sample. YMMV.


#13

True, I also like to use them more often. I suppose it could be made default, but it does also increase complexity by default also, which might confuse newcomers who need to immediately learn parametric types. A newcomer might want a mutable struct with an Any field type, to be able to place Any type into the field, but with the automatic parametric type feature, the newcomer would have to figure out to specifically declare the type of the field to be Any, which might lead to confusion since Any is typically implicit elsewhere in the language when no type assertion is specified. The current default is consistent with the behavior of Any.


#14

OK, thanks for the tips. Yes I was thinking about this because of what happens to function parameters as mentioned by @StefanKarpinski. In any case that comes because I have generic data types that use in several different programs, and sometimes I need to fill only some of the fields. Is like if in my croco type I need to use only x in one code, while in another I need only y. But it is useful for me to share the same struct between the two codes. The uestion comes then from the fact that when I only fill x in croco, y gets a random value in jupyter

drile = croco()
> croco(139812476115760, 2.0e-323)
drile.x = 1
drile
> croco(1, 2.0e-323)

I would rather like to have drile.y undefined, which is what I get with the krok struct (though I understand I shall avoid that)… can this be done?

Thanks again,

Ferran.


#15

Just for my education, is there a (simple) example where a conversion to b's type cannot be converted to something acceptable to a while the whole line a = b = 1 is accepted?


#16

It can literally be anything because of user-defined types and convert methods but you can already get in trouble just with built in types. Suppose b = 1 evaluated to the value of b after the assignment. We can simulate this by writing b = 1; a = b instead of a = b = 1—since that’s what it would mean. Here’s a simple example of how this would cause subtle programing traps:

function f(n)
    # some long setup code

    # later...
    # a = b = 1.0
    b = 1.0; a = b

    # here we expect a::Float64...
    for i = 1:n
        a *= i
    end

    return a
end

This works as expected:

julia> f(10)
3.6288e6

julia> f(100)
9.33262154439441e157

Later, someone comes along and decides that b should be an Int instead of a Float64:

function f(n)
    local b::Int # seems innocuous
    # some long setup code

    # later...
    # a = b = 1.0
    b = 1.0; a = b

    # here we expect a::Float64...
    for i = 1:n
        a *= i
    end

    return a
end
julia> f(10)
3628800

julia> f(100)
0

Oops.

Here’s an even worse example using a typed field:

mutable struct X
    f::Float64
    s::String
end

# in some other file...

function f(n)
    x = X(0, "Hello")
    # a = x.f = 1.0
    x.f = 1.0; a = x.f
    for i = 1:n
        a *= i
    end
    return a
end

In this version f works as expected like the first version above. However, suppose that later, some unsuspecting person realizes that the X type only needs to store integers and so they change the definition of X to this:

mutable struct X
    f::Int
    s::String
end

Seems harmless, right? But now, f would break again with no change to the definition of f even though the change to X by itself is correct.


#17

Thank you very much Stefan for putting this example together. It is clear that assignments always evaluate to the RHS is far more fundamental (and consistent) than the REPL warning that the actual value assigned in the last operation has been converted from a float to an int!

(just a ‘blast from the past’ when back in 1993 there was an interpretive version of Java (at that time still called Oak) and a JIT compiled version …)