Proper declaration of function accepting Dicts with mixed number types

fergu · April 7, 2022, 5:32pm

I have a function that accepts a dictionary as input where the arguments will always be string keys with values that are a vector of real numbers. They may in practice be integers or floats or whatever. What is important is that this function does not work if the arguments are not real numbers (So a vector of strings, or complex, etc, are not valid inputs)

julia> function foo(input_dict::Dict{String, <:AbstractVector{<:Real}})
           println(input_dict)
       end
foo (generic function with 1 method)

If I declare a dictionary like so

julia> a = Float32.(rand(3));
julia> b = Float64.(rand(3));
julia> c = Integer.(ceil.(rand(3).*10));
julia> my_dict = Dict("key1"=>a, "key2"=>b, "key3"=>c)
Dict{String, Vector} with 3 entries:
  "key2" => [0.223777, 0.394267, 0.0128682]
  "key3" => [6, 3, 10]
  "key1" => Float32[0.788629, 0.146732, 0.93549]

then

julia> typeof(my_dict)
Dict{String, Vector}

and the call to foo will fail, it seems because of the fact that my_dict has the type {String, Vector} (without the extra qualifier on type).

julia> foo(my_dict)
ERROR: MethodError: no method matching foo(::Dict{String, Vector})
Closest candidates are:
  foo(::Dict{String, <:AbstractVector{<:Real}}) at REPL[5]:1
Stacktrace:
 [1] top-level scope
   @ REPL[6]:1

On the other hand, this will work fine if all the value entries in the dict have the same type

julia> a = rand(3);
julia> b = rand(3);
julia> my_other_dict = Dict("a"=>a, "b"=>b)
Dict{String, Vector{Float64}} with 2 entries:
  "b" => [0.700047, 0.765836, 0.603907]
  "a" => [0.623392, 0.476285, 0.500553]

julia> foo(my_other_dict)
Dict("b" => [0.7000467111406179, 0.7658355327841554, 0.6039073311484552], "a" => [0.6233916490131216, 0.4762846111664425, 0.500552988508849])

This function is supposed to be part of a package, so ideally the fix involves changing the definition of foo rather than requiring the dictionary to be typed during creation.

What I’m wondering about is the “correct” way to fix this in terms the most Julia-esque way of doing it. I’m not terribly concerned about efficiency as this isn’t going to be handling high volumes of data.

My first thought was to do something with overloading. I.E

function foo(key::String, value::AbstractVector{<:Real})
...
end
# or...
function foo(input::Dict{String, <:AbstractVector{<:Real}})
...
end
# Then...
function foo(input::Dict{String, <:AbstractVector})
    for (key, value) in input
        foo(key, value)
    end
    # Or perhaps could also construct a dict for each entry?
    for (key, value) in input
        foo(Dict(key=>value))
    end
end

My only problem with this approach is that foo ultimately deals with writing to a file that already exists. It wouldn’t be great to have it write the first few keys and then fail halfway through because one key has the wrong type, which was why I was hoping to structure foo in a way that it fails before trying to write anything at all.

jling · April 7, 2022, 5:38pm

what about

foo(input::Dict)

fergu · April 7, 2022, 5:44pm

That would ultimately still allow “bad” inputs though, wouldn’t it?

The file format can only support String keys and Vector{<:Real} values, so I’d want to avoid, for example, Dict{Integer, String}. Perhaps the solution is some sort of sanity check within foo rather than trying to type foo such that it can’t be called with bad inputs?

stevengj · April 7, 2022, 6:02pm

Because the values are abstractly typed, anything using this dictionary will be type-unstable. You might as well just use foo(input::AbstractDict) and duck type.

The basic issue here is that:

julia> typejoin(Vector{Int}, Vector{Float64})
Vector

and is not any more specific type.

jling · April 8, 2022, 1:59am

if you want to avoid bad inputs at the type level, then you have to error on

julia> Vector
Vector (alias for Array{T, 1} where T)

you can’t have it both ways

gustaphe · April 8, 2022, 4:55am

The most Julian way is to allow bad inputs, and let it error when those non-reals are used in a real context.

lawless-m · April 9, 2022, 7:20am

I would like to add that checking arguments should be made at the call site, not inside a function.

You can provide a helper function to validate arguments.

e.g.


function foo_bad(a) 
   if isa(a, String)
       throw("a cannot be a string")
   end
   a * 3
end

because then everyone pays the price for the check


julia> check_a(a) = isa(a, String) && throw("a cannot be a string") || true
julia> foo_good(a) = a * 3
julia> check_a(1) && foo_good(1)
3

Sukera · April 9, 2022, 9:44am

Note that if you’re checking a type there to disallow it, having it in the function is perfectly fine. Type based checks are typically eliminated, since they’re static information that can be eliminated in most cases. Putting that check in another function is dangerous, since you may forgot to check that invariant at some places.

lawless-m · April 9, 2022, 11:34am

Obviously this is just opinion. I’m from the old school where writing functions which assume the validity of arguments is a design principle

Sukera · April 9, 2022, 5:27pm

I’ve always taken this in context of an internal function, where you’ve validated it already. When writing user facing API, checking invariants is usually a good idea.

Cloves_Almeida · April 12, 2022, 10:36am

This is still a valid principle, it’s just not enforced at function definition by the language. Like Python, Julia is a dynamic language.

I’m also an “old timer” and struggled with this for a while. But types in Julia are meant for dispatching and specialization, not for structural invariant checking. You can, however, add checks to the function to assert your expectations as needed.

Actually, the Julian way is duck typing - to have as little type information as possible in function definitions to allow composability.

fergu · April 12, 2022, 9:27pm

These are all great discussion! Thank you for the replies!

My main issue is that letting the wrong stuff get through to the file could result in a corruption issue (Or rather, non-conformity with the spec - strictly speaking I think the format itself can handle other datatypes as long as the keys are strings). It sounds like the best way to approach this will be to add in a validity check that is called within the write function before the actual writes take place to make sure everything that is to be written has a supported type, so that’s the route I will go.

Thank you for all the responses!

lawless-m · April 13, 2022, 5:31pm

It is a shame we don’t have domains applied to types too.

e.g. something like a::Int(range=1:100) which would restrict a to the values 1:100 inclusive

DNF · April 13, 2022, 6:26pm

You can make your own type that does this. I don’t see how this could be made a part of the native machine integer type…

Topic		Replies	Views
Dictionaries with abstract types for keys/values as function arguments General Usage question	7	1552	September 23, 2020
Type checking in Dictionaries General Usage	7	634	July 16, 2020
Dict issue? General Usage	10	498	July 1, 2020
Compatibility of Dict{String, Any} with Dict{String, String} General Usage dictionary	3	1659	March 20, 2018
Correct type in function for Dict of NTuple General Usage question , type , dictionary , ntuple	2	545	June 28, 2022

Proper declaration of function accepting Dicts with mixed number types

Related topics