Computed parametric field types using assertions in `getproperty`

mhauru · July 29, 2020, 10:16am

Quite often I have parametric types where the field types depend in some simple but non-trivial way on the type parameters. Something like

struct Dada{N}
    a::Array{Float64, N}
    b::Array{Float64, N+1}
end

The above isn’t valid Julia, and I’ve understood that the recommended way to deal with such cases, assuming one wants to maintain type stability, is

struct DadaImp1{N, Ta, Tb}
    a::Ta
    b::Tb

    function DadaImp1{N}(a::Array{Float64}, b::Array{Float64}) where {N}
        Ta = Array{Float64, N}
        Tb = Array{Float64, N+1}
        a::Ta
        b::Tb
        return new{N, Ta, Tb}(a, b)
    end
end

Now I’ve recently been using this pattern when building a library, which in turn uses types from another library that makes extensive use of this pattern, and I’m starting to end up with pretty ridiculous types such as

ModifiedBinaryLayer{TensorMap{ℤ₂Space,2,2,ℤ₂,TensorKit.SortedVectorDict{ℤ₂,Array{Complex{Float64},2}},FusionTree{ℤ₂,2,0,1,Nothing},FusionTree{ℤ₂,2,0,1,Nothing}},TensorMap{ℤ₂Space,2,1,ℤ₂,TensorKit.SortedVectorDict{ℤ₂,Array{Complex{Float64},2}},FusionTree{ℤ₂,2,0,1,Nothing},FusionTree{ℤ₂,1,0,0,Nothing}},TensorMap{ℤ₂Space,2,1,ℤ₂,TensorKit.SortedVectorDict{ℤ₂,Array{Complex{Float64},2}},FusionTree{ℤ₂,2,0,1,Nothing},FusionTree{ℤ₂,1,0,0,Nothing}}}

Most of that information is entirely redundant, the non-trivial part would simply be ModifiedBinaryLayer{ℤ₂Space, Array{Complex{Float64}}}.

I don’t know if these ballooning type parametrizations affect inference (please enlighten me if you can), but they definitely affect my sanity when reading error messages, @code_warntype, and various other things. Hence, I’ve been thinking of doing instead something like this:

struct DadaImp2{N}
    a::Array{Float64}
    b::Array{Float64}

    function DadaImp2{N}(a::Array{Float64}, b::Array{Float64}) where {N}
        Ta = Array{Float64, N}
        Tb = Array{Float64, N+1}
        a::Ta
        b::Tb
        return new{N}(a, b)
    end
end

function Base.getproperty(d2::DadaImp2{N}, s::Symbol) where {N}
    if s === :a
        T = Array{Float64, N}
    elseif s === :b
        T = Array{Float64, N+1}
    else
        T = Any
    end
    return getfield(d2, s)::T
end

(For most purposes?) type stability is still maintained, since the compiler can figure out
what type d2.b should be.

Assuming my fields aren’t bits types, but pointers, are there any downsides to this way of doing things?

PS. The optimal solution would be for this to get implemented: https://github.com/JuliaLang/julia/issues/18466 No idea if that’s in the cards though.

Tamas_Papp · July 29, 2020, 11:06am

I am not sure what that means in this context, but I admit that I derive most of my amusement from books, cartoons, and webcomics, not Julia types.

It is pretty orthogonal to inference and execution speed: you need concrete types in most cases for the latter, while (seemingly) redundant type parametrizations are required to enforce constraints on types other than subtype relations.

If it bothers you, you may want to define a custom show method.

These are not concrete, so performance may suffer if constant folding fails in the getproperty. Also, I think there are much simpler and cleaner solutions.

In case you don’t need the type array dimensions in the type and want to allow other array types, you could do your original example as

struct DadaImp3{TA<:AbstractArray,TB<:AbstractArray}
    a::TA
    b::TB
    function DadaImp3(a::AbstractArray{T,N}, b::AbstractArray{S,M}) where {T,S,N,M}
        @assert M == N + 1
        return new{typeof(a),typeof(b)}(a, b)
    end
end

If you do want to restrict to Array{Float64}, you can use something like

struct DadaImp4{N,M}
    a::Array{Float64,N}
    b::Array{Float64,M}
    function DadaImp4(a::Array{Float64,N}, b::Array{Float64,M}) where {N,M}
        @assert M == N + 1
        return new{N,M}(a, b)
    end
end

Or do an interim case for a common element type with DadaImp{T,N,M} etc. It all depends on your use case.

mhauru · July 29, 2020, 12:21pm

The Array{Float64, N+1} is just a mock-up, the actual use case would be more like

struct Dada{N,T,G}
    a::f(N, T, G)
    b::g(N, T, G)
end

where f and g would be some type stable (and @pure?) functions.

My point is that if the return type of getproperty can be inferred and the field is stored as a pointer anyway, as far as I can see non-concreteness shouldn’t matter. What you are saying about constant folding seems key. If I do @code_warntype (x -> x.b)(d2) to confirm that the type of d2.b is correctly inferred, are there still other situations where constant folding would fail, and thus abstract types would pop up in inference? I know nothing about how constant folding works.

Oscar_Smith · July 29, 2020, 12:54pm

Don’t use @pure It doesn’t mean what you think it means and will make you sad.

Tamas_Papp · July 29, 2020, 1:26pm

That’s an implementation detail you can never be sure about — it is up to the compiler.

It has some limitations but I think that in your case it would work. But why rely on it when it is not needed?

mhauru · July 29, 2020, 2:06pm

I would prefer having type parametrisation match the way a human naturally thinks about the code. For instance, when writing a library, types are often a very user-facing part, and a notion of “object of type X parametrised by Y” may be very intuitive. Preferably how Y relates to the types of fields would be an implementation detail that a user wouldn’t have to care about, or be confronted with.

Taking the example from my opening post, ModifiedBinaryLayer{ℤ₂Space, Array{Complex{Float64}}} is a very natural type to have for my library: For anyone who knows what the code is about, it makes intuitive sense and its meaning is immediately clear. Having that expanded into something like

ModifiedBinaryLayer{TensorMap{ℤ₂Space,2,2,ℤ₂,TensorKit.SortedVectorDict{ℤ₂,Array{Complex{Float64},2}},FusionTree{ℤ₂,2,0,1,Nothing},FusionTree{ℤ₂,2,0,1,Nothing}},TensorMap{ℤ₂Space,2,1,ℤ₂,TensorKit.SortedVectorDict{ℤ₂,Array{Complex{Float64},2}},FusionTree{ℤ₂,2,0,1,Nothing},FusionTree{ℤ₂,1,0,0,Nothing}},TensorMap{ℤ₂Space,2,1,ℤ₂,TensorKit.SortedVectorDict{ℤ₂,Array{Complex{Float64},2}},FusionTree{ℤ₂,2,0,1,Nothing},FusionTree{ℤ₂,1,0,0,Nothing}}}

makes the code less readable and less self-documenting. It also makes it more annoying to write anything that explicitly refers to this type, such as method signatures and cases where this would be the field type for some other type.

Tamas_Papp · July 29, 2020, 3:30pm

This is a very common request, but that’s not how Julia’s type parameters work. Again, you can use some subtype & triangular restrictions, but that’s pretty much it. The rest is enforced by the constructor, not the type system.

Just change the user facing API then — provide constructors that calculate implied fields, change methods for show, etc. You can hide pretty much everything from the “user”.

mhauru · July 29, 2020, 4:08pm

Well, surely noone would object to greater intuitiveness if it comes with no cost, and it seems to me that I can make the Julia type system work the way I want with some type assertions in getproperty. You clearly think that my proposal has downsides to it, but could you help me understand what they are? You mention “limitations” of constant folding, but I don’t know what you mean. Do you think that even if constant folding saves type stability in my example case, it might fail to do so in some more complicated situation?

Tamas_Papp · July 30, 2020, 9:10am

The semantics of constant propagation are not clearly defined in Julia; in theory the compiler can change its heuristics any time. That said, a simple . access that goes to getproperty should keep working.

My aversion to solutions like this is mainly a matter of taste: repeating all those types violates DRY, and it may be brittle if you want to extend this to cases which would otherwise be nice bits types (eg if you switch to static arrays). But if it works for you, that should be fine.

mhauru · August 12, 2020, 11:15am

I’ve been thinking about and trying various approaches to the question I was asking in this thread, including using what Tamas was proposing above, i.e. defining methods for show to hide some of the complexity of my parametric types. I just ran into some issues on GitHub that discuss problems that may arise from this, see

github.com/JuliaLang/julia

Segfault when defining show for types

opened 12:47PM - 14 Jun 17 UTC

closed 06:16AM - 16 Mar 21 UTC

quinnj

Here's the minimum repro ```julia _ _ _(_)_ | A fresh approach t…o technical computing (_) | (_) (_) | Documentation: https://docs.julialang.org _ _ _| |_ __ _ | Type "?help" for help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 0.7.0-DEV.511 (2017-06-08 21:22 UTC) _/ |\__'_|_|_|\__'_| | (HEAD detached at origin/jb/namedtuples)/e2c2724982* (fork: 6 commits, 5 days) |__/ | x86_64-apple-darwin16.6.0 julia> struct Null end julia> Base.show(io::IO, ::Type{Union{T, Null}}) where {T} = print(io, "?$T") julia> struct WeakRefString{T} <: AbstractString ptr::Ptr{T} len::Int # of code units ind::Int # used to keep track of a string data index end julia> Base.show(io::IO, ::Type{WeakRefString{T}}) where {T} = print(io, "WeakRefString{$T}") julia> temp = [] Segmentation fault: 11 jq-mbp:julia jacobquinn$ ``` Note the last thing that happens is it's trying to display `0-element Array{Any,1}`.

both Jeff’s comment, and the issue and it’s referencing issues more generally.

chakravala · August 14, 2020, 3:52am

Have you tried this?

https://github.com/vtjnash/ComputedFieldTypes.jl

mhauru · August 14, 2020, 8:09am

I have not, thanks for the tip. So does it just turn a thing like

@computed struct A{V <: AbstractVector}
    a::eltype(V)
end

into

struct A{V <: AbstractVector, E}
    a::E
    function A{V}(a) where {V}
        E = eltype(V)
        new{V, E}(a)
    end
end

? Or maybe something slightly different in terms of which constructors are defined, but you get the idea.

mhauru · August 14, 2020, 8:14am

Should have looked into this a bit more: The docs for show explicitly tell us how to do this right:

To customize human-readable text output for objects of type T , define show(io::IO, ::MIME"text/plain", ::T) instead.

Topic		Replies	Views
Type instability in parametric struct General Usage parametric-types , type-stability	7	237	June 30, 2024
Computed type annotations for struct fields General Usage parametric-types	3	394	March 7, 2023
Parametric type with numbers General Usage parametric-types	3	767	January 13, 2022
Get type of field in parametric type General Usage parametric-types	4	5960	January 12, 2018
Type instability using interpolation as field of struct General Usage	6	868	July 31, 2019

Computed parametric field types using assertions in `getproperty`

Related topics