Defining a new type with constraints on values

Hi!

I understand from the Julia documentation that functions may be defined in all generality without specifying the type of its arguments, but that doing so may serve to catch programmer or user errors.

I’d like to know whether it is possible to define a new type that specifies not only a particular concrete type but also restricts the range of values allowed. For instance, I have a series of functions that rely on

  • variable ‘a’ which must take on integer values strictly confined in the range [1, 233]: these could thus be stored as 8-bit unsigned integers, but with constraints on the allowed values.

  • variable ‘b’ which must be an integer value equal to or greater than 995, but can be larger than 100,000, so that variable should be assigned to a 32-bit positive integer, but again with an additional constrain on the value.

  • variable ‘c’ is a (calendar) date, which must fall after a given initial date: in this case, the value could be a string (e.g., 2022-06-16) or a Julian day number, but either way it should point to a date after January 1, 2000, say.

Is it possible to include those constraints in the type definition so that the REPL automatically catches forbidden values of function arguments? If so, how is this implemented?

Thanks for your advice. Michel.

Yes for all three.

You just define a new type, with a field of a type that can store the range of valid values, and define an inner constructor that disallows the invalid values. In other words, basically the same as any other language I know? For example, for your first type:

julia> struct A
           value :: UInt8
           function A(value)
               if value < 1 || value > 233
                   throw(DomainError("$value: not in [1, 233]"))
               end
               return new(value)
           end
       end

julia> A(5)
A(0x05)

julia> A(0)
ERROR: DomainError with 0: not in [1, 233]:

Stacktrace:
 [1] A(::Int64) at ./REPL[3]:5
 [2] top-level scope at REPL[5]:1

julia> A(234)
ERROR: DomainError with 234: not in [1, 233]:

Stacktrace:
 [1] A(::Int64) at ./REPL[3]:5
 [2] top-level scope at REPL[6]:1
1 Like

Here’s one way (and I’m sure this could be improved somehow).

struct MyInt{T<:Integer} <: Integer
    x::T
    function MyInt(x::Number)
        # ensure exact integer
        x = Int(x)
        # check range
        if (x < 1) || (x > 233)
            throw(ArgumentError("x is $x but must be in the range [1, 233]"))
        end
        return new{typeof(x)}(x)
    end
end

function (::Type{T})(x::MyInt) where {T<:Number}
    return convert(T, x.x)
end

function Base.promote_rule(::Type{T}, ::Type{MyInt{S}}) where {T<:Number, S}
    return promote_type(T, S)
end

function foo(x::MyInt)
    @show x
    if (x < 1) || (x > 233)
        @warn "we messed up the implementation"
    end
    return sin(x)
end
foo(x::Number) = foo(MyInt(x))

using Test
@test foo(2) == sin(2)
@test foo(2.0) == sin(2)
@test_throws InexactError foo(2.3)
@test_throws ArgumentError foo(0)

The type definition of MyInt has an inner constructor that enforces the range constraint. The next function ensures that this type can be converted to other numbers as needed. The promote rule function tells other methods that when figuring out how to do arithmetic, you can convert MyInt to its parametric type.

Now we can define our function foo that dispatches on MyType and does whatever logic we want we that input. Lastly, we can make a method that attempts to convert a generic number to the constrained number when we use it with foo.

1 Like

I don’t think this should be in the inner constructor, as this will always force the type parameter to be Int64:

julia> MyInt(UInt8(3))
MyInt{Int64}(3)

Instead, let the inner constructor allow Integer, and an outer constructor MyInt(x::Number) = MyInt(Int(x)).

Ah, good point! So then the updated type definition would look like

struct MyInt{T<:Integer} <: Integer
    x::T
    function MyInt(x::Integer)
        if (x < 1) || (x > 233)
            throw(DomainError("x is $x but must be in the range [1, 233]"))
        end
        return new{typeof(x)}(x)
    end
end
MyInt(x::Number) = MyInt(Int(x))

Thanks a lot @Henrique_Becker, @awasserman and @DNF, for this interesting discussion. Here is a further twist to my Julia explorations:

  • It turns out that valid values of the variable ‘a’ above could be either an integer in the range [1, 233], or a string representation of that number (depending on the context). In this latter case, two formats would be considered acceptable: either “xxx” where x stands for a digit character (zero-padded if necessary), or “Pxxx” where the character “P” is an optional constant prefix.

  • I would thus like to define the new type (“MyInt” in the examples above) in such a way that, if the value of a variable of this new type is provided as a number, the corresponding string values must be automatically accessible (through whatever appropriate syntax may be necessary, such as “a.s” and “a.ps”, for example), where the integer value would be zero-padded as appropriate.

  • Similarly, if the value is specified in one of the valid string formats, its integer value should also be automatically accessible (e.g., as “a.i”).

Examples:
Valid values: 92, “092”, “P092”
Invalid values: -1, 234, “92” (though this could be concerted to “092”), “P 092”, “Q92”, etc.

If this can be achieved, once a new variable “v” is declared to be of this new type, I could use “v.i” while processing, and “v.ps” (or “v.s”) in constructing the filename containing the results of the computation, for instance. This would presumably involve the “convert” and/or the “parse” functions, but is this allowed inside the new type description? And how should I expand “parse” to allow for the initial character “P” in front of the number?

In summary, once a variable has been initialised to this new type, I should be absolutely confident that its numerical value is within the allowed range and also have the opportunity of using either of the three representations of that variable (in appropriate contexts).

Thanks again for the time you take in answering these questions.

  1. You can have the type storing the three representations, but this is probably not what you want. This is only a good idea if the values are rarely changed or combined after created but are often queried (in its three different forms). Otherwise you will spend a lot of extra memory and effort in spurious conversions.
  2. You can compute the different representations on-the-fly while using the obj.field syntax. This is done by overriding getproperty, look at its documentation for an example.
  3. I suggest you always (convert if necessary and) store the value as an UInt8. This is the representation that takes the least memory and easiest to work with.
  4. Yes, you should extend Base.parse for your type. You can look at my implementation of Base.parse(Rational{...}, ...) for an example.
  5. You are using the term “variable” when you mean object/instance/value.

I would probably just make an additional outer constructor for your type, for example:


function MyInt(x::AbstractString)
    m = match(r"^P(\d+)$", x)
    i = if isnothing(m)
        try
            parse(Int, x)
        catch
            throw(ArgumentError("failed to parse $x as an integer"))
        end
    else
        parse(Int, first(m.captures))
    end
    return MyInt(i)
end

This method would run for any string-like input. It first checks with a regular expression (constructed with the r in front of the string literal) to see if the string starts with “P” and ends with a string of digits. If so, we parse the digits as an integer. Otherwise we try parsing the full string as an integer (leading zeros will be ignored by parse(Int, x)). Either way, we pass the resulting integer to our initial inner constructor, which does our original constraint checking.

As for the string printing, I would recommend making a method that constructs the exact string representation that you need and then use that to compute the string representation on demand. If you really needed this precomputed string representations, you could modify the type definition to store this value during the inner constructor, like so:

struct MyInt{T<:Integer} <: Integer
    x::T
    s::String
    ps::String
    function MyInt(x::Integer)
        if (x < 1) || (x > 233)
            throw(DomainError("x is $x but must be in the range [1, 233]"))
        end
        s = string(x, pad = 3)
        ps = "P" * s
        return new{typeof(x)}(x, s, ps)
    end
end
2 Likes