Multiple tests and error checking at the time of creating a struct with a many properties

I am new to Julia and object-oriented programming. I am used to the following practice

function compare_sizes(x,y)
    length(x) == length(y) ? "good to go" : error(DimensionMismatch)
end

Now, in object-oriented programming, I want to achieve similar behavior. I naively tried the following

struct Pair
    x
    y
    length(x) == length(y) ? new(x, y) : error(DimensionMismatch)
end
# ERROR: UndefVarError: x not defined

Okay, I think I understand why this happened. After looking up for a couple of hours, I found these threads.

  1. /t/doing-trivial-transformations-and-assertions-inside-a-struct-definition/61054
  2. https://stackoverflow.com/questions/67430487/julia-how-to-use-the-assert-macro-with-a-struct-with-kw-vs-base-kwdef
  3. /t/struct-with-fixed-array-sizes/15396

This led me to read through inner-constructors.

Now I understand that I can achieve what I want by doing the following

struct Pair
    x
    y
    Pair(x, y) = length(x) == length(y) ? new(x, y) : error(DimensionMismatch)
end

A fancier version would be

using StaticArrays

struct SPair{L}
    x::SVector{L, Number}
    y::SVector{L, Number}
end

SPair(x, y) = length(x) == length(y) ? SPair{length(x)}(x, y) : error(DimensionMismatch)

Also, the documentation says

It is good practice to provide as few inner constructor methods as possible: only those taking all arguments explicitly and enforcing essential error checking and transformation.

However, my question is, how do I make this scale robustly with more properties and checks? Writing inner-constructors with all the arguments multiple times is not a good way; it feels like copy-pasting.

My urge was to go towards metaprogramming, to create error checking inner-constructors. I am aware of this /t/how-to-warn-new-users-away-from-metaprogramming/35022/3.

Thank you!
P.S. the strings starting with /t/ are links to julialang discourse threads. I didn’t know how to post more than 2 links.

A) I would refrain from calling a type Pair, because that shadows the name of a built-in type.
B) Your SPair type will not work, because Number is an abstract type. Consider doing something like this instead:

struct SPair{L, T<:Number}
    x::SVector{L, T}
    y::SVector{L, T}
end

C) By placing the L type parameter in SPair and sharing it with x and y, they will automatically be forced to have the same length without additional error checks.
D) If you want lots of error checks, place them all in a single inner constructor. Then, for all the other “convenience” constructor methods, implement them as outer constructors that call that inner constructor.

For a more complex inner constructor you can do

struct Cat
    height
    length
    name
    color
    Cat(height, length, name, color) = begin
        meets_requirements = (
            height < length &&
            length(name) < 100 &&
            color in (:black, :white, :tabby)
        )
        @assert meets_requirements "reqs not met"
        return new(height, length, name, color)
    end
end

Edit: you should make better errors and such, but just wanted to show an example of the syntax for a multi-line function so you can do more stuff.

Edit2: something that sometimes surprises newcomers is that the protections are only checked at initiation. So if you assert that length(x)==length(y), but then later you push! an element onto x, they will no longer be the same length.

1 Like

A) I would refrain from calling a type Pair, because that shadows the name of a built-in type.

Aha! Thanks for pointing this out.

B) Number is an abstract type

Got it! I have to T<:Real for abstract types.

C) By placing the L type parameter in SPair and sharing it with x and y, they will automatically be forced to have the same length without additional error checks.

I missed this redundancy :sweat_smile:

D) If you want lots of error checks, place them all in a single inner constructor. Then, for all the other “convenience” constructor methods, implement them as outer constructors that call that inner constructor.

I think this is what I was looking for, so I haven’t missed something; I have to be smart, for instance, by packing all the checks in one inner constructor.

Thank you for your response.

Edit: you should make better errors and such, but just wanted to show an example of the syntax for a multi-line function so you can do more stuff.

Yes, I see the idea, and it is in agreement with what @uniment pointed out in D)

Edit2: something that sometimes surprises newcomers is that the protections are only checked at initiation. So if you assert that length(x)==length(y) , but then later you push! an element onto x , they will no longer be the same length.

I would have definitely spent more hours on this. Thank you for saving me some headaches. Does this essentially mean that the SVector approach is a better one?

I’m not 100% sure what you’re trying to acheive so here are two possible answers:

If you know that your vectors will be the same length for their whole life (you aren’t going to push!() to them), then if you use the paramater in the type definition like @uniment suggested, then you will guarantee they are the same length at construction AND the SVector is static (can’t change size) so it will ensure that they are always the same length.

If you want to be able to add to the vectors in SPair, then SVector won’t work because… well they’re static length! So in that case, you will need to define a function for adding to your type. Here is a small example:

struct MyPairs{T<:Number}
    x::Vector{T}
    y::Vector{T}
    MyPairs(x,y) = length(x) == length(y) ? new(x,y) : error(DimensionMismatch)
end

function Base.push!(a::MyPairs, x, y) 
    push!(a.x, x)
    push!(a.y, y)
end

And then only push to the object using your new function.

One other point of clarification:
Julia has Abstract types and concrete types. Concrete types are like Int, String, MyPairs. They are specific types that can hold data. Abstract types are parents of concrete types, and they cannot hold data. Examples: Real, Number, AbstractString.

When you defined SPair.x in the OP, you said that it should be an SVector with elements of type Number… but that forces Julia to create a vector that can hold ANY kind of number, Float64, Int8, etc. This is bad because Julia can optomize memory and functions much better if it know exactly which concrete type to expect in the vector.

So @uniment is suggesting that you define the type as SVector{L, T<:Number} which means to Julia “a vector of elements of some specific type that is a Number,” so it could be a vector of integers OR a vector of Floats, but not both.

You can also protect the values of fields in a mutable struct by writing your own setproperty! function

# Same Cat type as above, but defined as a mutable struct

function Base.setproperty!(c::Cat, prop::Symbol, val)
    if prop == :name
        @assert length(val) < 100 "name's too long"
    end
    setfield!(c, prop, val)
end

I’m not 100% sure what you’re trying to achieve, so here are two possible answers:

Apologies for being vague :sweat_smile:
I intend to figure out a way to pack a lot of error-checking, not just the same length, for a struct with many properties. I think I got my answer which was to pack everything in one inner constructor.

One other point of clarification:
Julia has Abstract types and concrete types…

OP was a simplified case of my problem, and what I came across in other threads, my generalization of using abstract types was not necessary. I know the exact type I will be dealing with, and I understand that declaring that makes the code more optimized. (I come from a dynamically typed language and still learning to be more vigilant about declaring types and avoiding type instabilities :smile:)

Thank you for your answers :smiley:

1 Like

I need to select the answer, and this is hard :sweat_smile:
I mean no disrespect to the first answer; I am selecting the second answer because it has an explicit example of an inner constructor with multiple checks. In the future, if me or someone’s visits the thread, I think it is more convenient to jump to that example. I hope this is okay :smile:

1 Like

Not quite—if the type was SVector{L, <:Number} then this would be true. However, the OP wrote SVector{L, Number}. Because Julia types are invariant in their parameters (except for Tuples and Unions), it’s impossible to instantiate an object of this type; the code is simply non-functioning.

Writing SVector{L, <:Number} would work, and would result in sub-optimal performance for the reasons you specified.

Declaring types doesn’t necessarily result in performance improvement; the compiler just needs some way of tracking what the concrete types are. Declaring your struct with fields of concrete types is one way to do this; another is type parametrization.

The basic litmus test is: if, looking at typeof(myobj) I can tell what the types of its fields are, then accessing those fields will be type-stable. For example, this results in type instability:

struct Foo
    x::Number
end
foo1 = Foo(1)
foo2 = Foo(2.0)

Accessing foo1.x fetches a value of a different type than accessing foo2.x, but you wouldn’t be able to tell this just by looking at typeof(foo1) and typeof(foo2) (both simply return Foo). Annotating x::Number does perform a type check at construction time, which can be useful for ensuring proper functionality (so that whatever x is, it acts like a number), but it doesn’t give the compiler anything to track the type afterward so any functions called on these objects will experience type instability when the x field is accessed.

By comparison, this will not result in type instability:

struct Bar{T<:Number}
    x::T
end
bar1 = Bar(1)
bar2 = Bar(2.0)

The type of field x becomes a type parameter for Bar, so there is no type instability when we access it. Try calling typeof(bar1) and typeof(bar2), and you will see that now the type of field x is being tracked. Thus, if I call a function on these objects, the compiler can create specialized performant methods because it can tell their types apart.

I usually default to type parameterization, because it allows more flexibility while maintaining type stability. But if there’s no reason to allow different concrete types, then just parameterizing with the specific type offers a more compact representation.

3 Likes

Thanks for the catch!

1 Like