Field types

lancejnelson · May 18, 2022, 6:50pm

I know that the following:

struct test{T<:Real}
    a::T
    b::T
end

is better than:

struct test
    a::Real
    b::Real
end

but why can’t I fully specify the field types like this:

struct test{T::Float64}
    a::T
    b::T
end

Wouldn’t this allow the compiler to optimize better?

johnmyleswhite · May 18, 2022, 6:57pm

I think you’re confusing optimization with how much typing you have to do. With your last example, you’ve effectively written:

struct test
    a::Float64
    b::Float64
end

That’s more typing, but it’s identical to what I think you want your example to mean so both would identical efficiency.

lancejnelson · May 18, 2022, 6:59pm

Well now I’m confused about why my first example allows for better optimization over the second.

goerz · May 18, 2022, 7:05pm

In the first example, a and b have to be of the same concrete type, whereas in the second example, a and b can be different concrete types (as long as they’re both subtypes of Real)

johnmyleswhite · May 18, 2022, 7:06pm

Your first example creates an infinite family of types – one for each value of T. For example, when T === Float64, your first example creates the equivalent of:

struct test
    a::Float64
    b::Float64
end

Your second example instead is exactly:

struct test
    a::Real
    b::Real
end

So one type has concretely typed fields (of type Float64) and the other has non-concretely typed fields (of type Real).

lancejnelson · May 18, 2022, 7:13pm

So the compiler doesn’t need to know what the exact concrete type is a priori, just which fields have the same concrete types

Mason · May 18, 2022, 7:14pm

No, the fact that there are two fields in this struct is a total distraction. The same problem is present in

struct test
    a::Real
end

which is different from

struct test{T <: Real}
    a::T
end

johnmyleswhite · May 18, 2022, 7:15pm

Depends what “a priori” means since Julia isn’t statically typed. In the sequence of lines of code below, the exact types are known when an object is created:

julia> struct test{T <: Real}
       a::T
       B::T
       end

julia> test(1.0, 2.0)
test{Float64}(1.0, 2.0)

Mason · May 18, 2022, 7:17pm

I strongly recommend reading Types · The Julia Language and Types · The Julia Language

lancejnelson · May 18, 2022, 7:17pm

I have. I’ll re-read it though.

lancejnelson · May 18, 2022, 7:18pm

So for the following struct:

struct NS
    nParticles:: Int64
    setSize:: Int64
    l:: Int64
    boxSize:: Float64
    energies:: Array{Float64,1}
    activeSet:: Array{SVector{2,Float64},2}

end

should be re-written to:

struct NS{T <: Real,S <: Real}
    nParticles:: T
    setSize:: T
    l:: T
    boxSize:: S
    energies:: Array{S,1}
    activeSet:: Array{SVector{2,S},2}

end

johnmyleswhite · May 18, 2022, 7:20pm

There is no efficiency gain from that change, you’ve just increased the number of valid types that can be bound to T and S. The first example is also somewhat broken: you have T and S parameters, but they’re never used.

Mason · May 18, 2022, 7:21pm

So basically what’s going on here is that when you have

struct test
    a::Real
end

then any time julia wants to look inside a test object, it has no idea what it’s going to get out, all it knows is that the data it gets will be a subtype of Real, but subtypes of Real could have any memory layout imaginable, and any set of methods defined for them, so there are essentially no optimizations that can be performed until Julia actually unpacks the struct itself and looks at the concrete type.

On the other hand, when you write

struct Test{T <: Real}
    a::T
end

then a Test(1) has a different type from a Test(1.0) (that is, Test{Int} vs Test{Float64}) and that information can be used to do concrete optimizations because the memory layout is thus fixed forever, and the methods on Int and Float64 are fixed in a given worldage.

Mason · May 18, 2022, 7:25pm

Writing

struct Test{T}
   a::T
end

can basically just be thought of as convenient syntax for writing

struct TestInt
    a::Int
end
struct TestFloat64
   a::Float64
end
struct TestReal
   a::Real
end
...

for every single possible subtype of Real. The parameters basically just let us easily define a group of implicitly defined types. Test{Int} and TestInt have all the same properties, Test{Int} is just easier to work with.

lancejnelson · May 18, 2022, 7:46pm

So what is the proper way to build this struct?

Mason · May 18, 2022, 8:02pm

That depends entirely on what you want it to hold. Do you only want to store Float64 data? then your first example is fine (once you remove the unnecessary type parameters). Do you want to be able to efficiently store any types T <: Real and S <: Real? Then use the second.

lancejnelson · May 20, 2022, 4:21pm

But in the second case the types of the fields can be inferred from the type of the wrapper object (from performance tips in the manual).

lancejnelson · May 20, 2022, 4:22pm

So even

struct test{T}
    a::T
end

would be better than:

struct test
    a::Float64
end

right?

johnmyleswhite · May 20, 2022, 4:30pm

No. There are two orthogonal concepts here:

Using a parametric type instead of manually creating multiple similar types.
Minimizing the use of abstract types for efficiency.

Let’s consider three code options:

Option A: Manually write out multiple types

struct TestInt64
    a::Int64
end

struct TestFloat64
   a::Float64
end

Option B: Use a parametric type

struct Test{T <: Real}
    a::T
end

Option C: Use a single type with abstract fields

struct Test
    a::Real
end

There is no difference in efficiency between A and B – the question is whether you create a lot of redundant types or a single parametric type.

There is an efficiency improvement between B and C – for any given type under Real, B creates a new “customized” struct type that has a concrete field, whereas C reuses the same inefficient struct type every time.

Mason · May 20, 2022, 4:30pm

It’s ‘better’ in that it can store any type. But if you’re only wanting to strore Float64, then it’s the exact same.

That is, if you have

struct Test1{T}
    a::T
end 

struct Test2
    a::Float64
end

then Test1(1.0) is basically the exact same thing as Test2(1.0).