Using `length` makes the function type unstable

function countries()
    a = JSON.parsefile("country_data.json", dicttype=Dict,  use_mmap=true)
    c = Vector{String}(undef, length(a))
    for (i, d) in enumerate(a)
        c[i] = string(d["name"])
    end
    return c
end

If I replace the length(a) with a hardcoded value, the function is type stable. I am not sure how it’s not type stable though, length should always return an integer. And the function is returning an array of Strings. All the types are there…

code_warntype:

julia> @code_warntype countries()
Body::Any
2 1 ── %1  = JSON.Parser.Int64::Type{Int64}                                                                                                                                                         │╻      #parsefile
  │    %2  = π (%1, Type{Int64})                                                                                                                                                                    ││
  │    %3  = (Base.sle_int)(1, 1)::Bool                                                                                                                                                             ││╻╷╷╷   isempty
  └───       goto #3 if not %3                                                                                                                                                                      │││┃││    iterate
  2 ── %5  = (Base.sle_int)(1, 0)::Bool                                                                                                                                                             ││││┃││    iterate
  └───       goto #4                                                                                                                                                                                │││││┃      iterate
  3 ──       nothing                                                                                                                                                                                │
  4 ┄─ %8  = φ (#2 => %5, #3 => false)::Bool                                                                                                                                                        ││││││
  └───       goto #6 if not %8                                                                                                                                                                      ││││││
  5 ──       invoke Base.getindex(()::Tuple{}, 1::Int64)                                                                                                                                            ││││││
  └───       $(Expr(:unreachable))                                                                                                                                                                  ││││││
  6 ──       goto #8                                                                                                                                                                                ││││││
  7 ──       $(Expr(:unreachable))                                                                                                                                                                  ││││││
  8 ┄─       goto #9                                                                                                                                                                                │││││
  9 ──       goto #10                                                                                                                                                                               │││╻      iterate
  10 ─       goto #11                                                                                                                                                                               │││
  11 ─ %17 = π (%2, Type{Int64})                                                                                                                                                                    ││
  │    %18 = invoke Base.Filesystem.stat("country_data.json"::String)::Base.Filesystem.StatStruct                                                                                                   │││╻      filesize
  │    %19 = (Base.getfield)(%18, :size)::Int64                                                                                                                                                     ││││╻      filesize
  │    %20 = %new(getfield(JSON.Parser, Symbol("##4#5")){UnionAll,DataType,Bool,Int64}, Dict, %17, true, %19)::getfield(JSON.Parser, Symbol("##4#5")){UnionAll,DataType,Bool,Int64}                 ││╻      #parsefile#3
  │    %21 = JSON.Parser.open::typeof(open)                                                                                                                                                         │││
  │    %22 = invoke Base.:(#open#294)($(QuoteNode(Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}()))::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, %21::Function, %20::getfield(JSON.Parser, Symbol("##4#5")){UnionAll,DataType,Bool,Int64}, "country_data.json"::String)::Any
  └───       goto #12                                                                                                                                                                               ││
3 12 ─ %24 = (Main.length)(%22)::Any                                                                                                                                                                │
  │    %25 = (Array{String,1})(Main.undef, %24)::Any                                                                                                                                                │
5 │    %26 = (Main.enumerate)(%22)::Base.Iterators.Enumerate{_1} where _1                                                                                                                           │
  │    %27 = Base.iterate::Core.Compiler.Const(iterate, false)                                                                                                                                      │
  │    %28 = (%27)(%26, (1,))::Union{Nothing, Tuple{Tuple{Int64,Any},Tuple{Int64,Any}}}                                                                                                             │╻      iterate
  │    %29 = (%28 === nothing)::Bool                                                                                                                                                                │
  │    %30 = (Base.not_int)(%29)::Bool                                                                                                                                                              │
  └───       goto #15 if not %30                                                                                                                                                                    │
  13 ┄ %32 = φ (#12 => %28, #14 => %41)::Union{Nothing, Tuple{Tuple{Int64,Any},Tuple{Int64,Any}}}                                                                                                   │
  │    %33 = π (%32, Tuple{Tuple{Int64,Any},Tuple{Int64,Any}})                                                                                                                                      │
  │    %34 = (Core.getfield)(%33, 1)::Tuple{Int64,Any}                                                                                                                                              │
  │    %35 = (Base.getfield)(%34, 1)::Int64                                                                                                                                                         ││╻      indexed_iterate
  │    %36 = (Base.getfield)(%34, 2)::Any                                                                                                                                                           │╻      indexed_iterate
  │    %37 = (Core.getfield)(%33, 2)::Tuple{Int64,Any}                                                                                                                                              │
6 │    %38 = (Base.getindex)(%36, "name")::Any                                                                                                                                                      │
  │    %39 = (Main.string)(%38)::Any                                                                                                                                                                │
  │          (Base.setindex!)(%25, %39, %35)                                                                                                                                                        │
  │    %41 = (Base.iterate)(%26, %37)::Union{Nothing, Tuple{Tuple{Int64,Any},Tuple{Int64,Any}}}                                                                                                     │
  │    %42 = (%41 === nothing)::Bool                                                                                                                                                                │
  │    %43 = (Base.not_int)(%42)::Bool                                                                                                                                                              │
  └───       goto #15 if not %43                                                                                                                                                                    │
  14 ─       goto #13                                                                                                                                                                               │
8 15 ─       return %25

I think I see what the problem is. The JSON function is not typesafe.

julia> typeof(a)
Array{Any,1}

But I am not really sure how this propagates… length shouldn’t matter.

That’s a pretty sensible assumption, but there’s no reason for it to always be true. If the type of a can’t be inferred, then the type of length(a) could be absolutely anything.

If you are concerned about performance of this function, then I would suggest splitting it up with a function barrier:

function countries()
  a = parse_json()
  do_stuff_with_json(a)
end

or if you are concerned about the return type being inferrable, then you could use a type annotation:

c::Vector{String} = Vector{String}(undef, length(a))

Thanks,

  1. regarding the type annotation. I thought that wasn’t necessary in Julia? For example, in

c= Vector{String}(undef, length(a))

am I not asking c to be precisely a Vector with string elements? I guess I am a little confused about how this works. The way I see it, the compiler should see c as a vector of strings. It returns c so it should know what the function return type is. Even if length(a) is not inferable, how does that “mess” everything up?

  1. I am not sure how exactly to use a function barrier:

function json_barrier 
    a = JSON.parsefile("country_data.json", dicttype=Dict,  use_mmap=true)
end

function herdimmunity(cn)
    a = json_barrier()
    #a = JSON.parsefile("country_data.json", dicttype=Dict,  use_mmap=true)    
    idx = findfirst(x -> x["name"] == cn, a)
    length(idx) == 0 && error("Country dosn't exist")
    return a[idx[1]]["data"]["preimmunity"]
end

still has the same problem. The code_warntype() (relevant line) reads

2 1 ─ %1  = (Main.json_barrier)()::Any                                                                                                                                                                                     │
4 │   %2  = %new(getfield(Main, Symbol("##96#97")){String}, cn)::getfield(Main, Symbol("##96#97")){String}                                                                                                                 │
  │   %3  = (Main.findfirst)(%2, %1)::Any
...

The compiler can’t prove that. For example:

julia> function Vector{String}(::typeof(undef), ::String)
         return π
       end

julia> Vector{String}(undef, "hello")
π = 3.1415926535897...

Rather than annotating the type of c, you could simply annotate the type of length(a) or of a, if you know it.

As for your function barrier, you’ve yours backwards. You want to do the type-unstable work and then call a function on that result (see my original example), rather than just putting the type-unstable work inside yet another function. Also see: Performance Tips · The Julia Language

2 Likes

Or better still, put the annotation on length(a)::Int.

7 Likes

So, what is the purpose of declaring the element type for an array if the compiler can’t prove the elements are indeed that type? Furthermore, it’s not like I can push int into a vector Array{String}.

and thanks for letting me know about the function barrier. I’ll look into it.

If tries to prove it using type inference but if that is not possible, instead of erroring, the code still runs since Julia is a dynamically typed language.

2 Likes

Thanks, so let me just summarize my understanding.

When I run c = Vector{String}(undef, length(a)), because the compiler dosn’t know what length(a) return type is (although I dont know why it wouldn’t always be an integer), this propagates through the Vector() function, and so the typeof(c) is unknown to the compiler.

I can fix this by letting the compiler know that length(a) is integer as @StefanKarpinski points out, i.e. c = Vector{String}(undef, length(a)::Int) or I can annotate the vector c as Vector{String}.

Ideally though, the author of the JSON package should implement type stable length for the object a.

My confusion was arising from the fact that I never thought of
c = Vector{String}(undef, length(a))
as a “function” call. I thought it was builtin Julia syntax/keyword. I just now realized that it’s simply a function Vector that has two parameters (::typeof(undef), ::Integer)

Thanks.

1 Like

Yes, I think you’ve totally got it :slight_smile:

As you say, a constructor (like Vector{String}(...)) is just a function. In reasonable code, it would be very very surprising for a constructor to return a totally different type, but the compiler can’t assume your code is reasonable. On the other hand, there are plenty of cases where you call a constructor and don’t get exactly that type: namely when you call an abstract type as a constructor and get a value of a concrete type. For example:

julia> using SparseArrays

julia> x = sprand(1, 1, 0.5)
1×1 SparseMatrixCSC{Float64,Int64} with 0 stored entries

julia> Array(x)
1×1 Array{Float64,2}:
 0.0

Array is a non-concrete type (namely Array{T, N} where {T, N}), but there is a specialized method for Array which takes a sparse array and returns a concrete array (in this case, Array{Float64, 2}) corresponding to the actual dimensions and element type of the sparse array. In this case, it feels totally natural that you call Array and get an Array{Float64, 2}. This isn’t limited to non-concrete types except by convention: it would simply be pretty weird to call Vector{String}() and get anything other than a Vector{String}.

Agreed–it would be very surprising for length(a) to return anything other than an integer. But the compiler has to be correct even for surprising code.

3 Likes

Isn’t the problem rather that the compiler doesn’t know the output type of JSON.parsefile? If it knew that, then it could also figure out the type of length(a).

For an unknown input type, length could be anything. I can easily envision someone returning a floating point value, for example for the length of an interval, or geometric object. Maybe unwise, but not unreasonable.

Right. I think the author of JSON would need to implement a type-safe version.

It may be worth noting that inference is “just” an optimization. A crucially important ine, no doubt, but it’s something that can improve or get worse over time without changing any behaviors (except performance).

It’s entirely possible that Julia improve in the future since there are a limited number of Array{T} methods. It still couldn’t deduce the dimensionality as the second argument could either be an integer or a tuple… and going through the hundreds of length methods to prove that none could possibly return a tuple would be quite time-consuming.

So this really isn’t about type-safety. It’s about Julia’s ability to predict types at compile-time to improve performance. And JSON — by definition — isn’t able to do that. It could parse to any of the supported JSON data types depending on the string. But you can easily add ::Int to the output of length to give Julia this additional info. You don’t need the JSON package to do this for you.

1 Like

I totally understand this statement, but it is possible for the compiler to enforce the rule that a constructor must return the type that it is intended for and raise an error if not?

I don’t think this is something the compiler should be concerned with (as it can’t figure out the intentions of the user). This is a style question and should be treated accordingly, ie one can do this but it is not considered good style.

I have run into the same issue once. Is there an example where it makes sense for length(x) not to return an integer?

Sure, imagine an interpolation object of non-integer length, where you can index in on any real number in the range between the real endpoints

You are right, e.g. length(stick) could even return something strange as 1.2cm. :slight_smile:

I don’t think whether something “makes sense” is the right question to ask, since it very much depends on the context and the design goals of the programmer.

As long as it is allowed to define

struct Foo end
Base.length(::Foo) = "a fish"

the compiler cannot rule it out.

There are builtin types (some ranges) that return non-Int ranges. Sure, they’re still Integer, but the difference between being able to infer ::Integer over ::Any isn’t all that much.