Declaring types: Vector{Union{Int64, Matrix{Int64}}} is much slower than Vector{Any}, why is that?

Declaring types: Vector{Union{Int64, Matrix{Int64}}} is much slower than Vector{Any}, why is that?

The following code is different from my declarations. f0 is not declared, f1 is declared ::Vector{Any}, and f2 is declared ::Vector{Union{Int64, Matrix{Int64}}}

function f0(path_info)
trace_iteration = []
  count=0
  for k in 1:length(path_info)
    p_path=path_info[k]
    t1::Int64 = p_path[1]
    t2::Int64 = p_path[2]
    if t1 < 5 && t2 <5
    count += 1
    end
  end
  push!(trace_iteration, count)
  return trace_iteration
end

function f1(path_info)
  trace_iteration = []
  count=0
  for k in 1:length(path_info)
    p_path::Vector{Any}=path_info[k]
    t1::Int64 = p_path[1]
    t2::Int64 = p_path[2]
    if t1 < 5 && t2 <5
    count += 1
    end
  end
  push!(trace_iteration, count)
  return trace_iteration
end

function f2(path_info)
  trace_iteration = []
  count=0
  for k in 1:length(path_info)
    p_path::Vector{Union{Int64, Matrix{Int64}}}=path_info[k]
    t1::Int64 = p_path[1]
    t2::Int64 = p_path[2]
    if t1 < 5 && t2 <5
         count += 1
    end
  end
  push!(trace_iteration, count)
  return trace_iteration
end

path_info0=[[rand(1:10),rand(1:10),rand(1:10),rand(1:10),rand(Int,2,2)] for i in 1:1000000]

@btime f0(path_info0) #16ms
@btime f1(path_info0) #15.6ms
@btime f2(path_info0) # 243ms

It can be seen that declaring a union type will be 10 times slower. Is there any way to declare the variable to speed up the function?

versioninfo()
Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 20 × 12th Gen Intel(R) Core(TM) i7-12700
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, alderlake)
  Threads: 1 on 20 virtual cores
Environment:
  JULIA_NUM_THREADS1 = 1
  JULIA_PKG_SERVER = https://mirrors.bfsu.edu.cn/julia
  JULIA_PYTHONCALL_EXE = @PyCall
  JULIA_EDITOR = code
  JULIA_NUM_THREADS =

you likely want Vector{<:Union{Int64, Matrix{Int64}}}. Your current code is making a copy of the Vector since Vector{Int} is not a subtype of Vector{Union{Int64, Matrix{Int64}}}

4 Likes

An error will occur after modification

 f(path_info0)

ERROR: MethodError: no method matching (Vector{<:Union{Int64, Matrix{Int64}}})(::Vector{Any})
Stacktrace:
[1] convert(::Type{Vector{<:Union{Int64, Matrix{Int64}}}}, a::Vector{Any})
@ Base .\array.jl:665

I’ll recommend you move your type assertions to the right side like

p_path = path_info[k]::Vector{Union{Int64, Matrix{Int64}}}
t1 = p_path[1]::Int64
t2 = p_path[2]::Int64

As a previous poster suggested, part of your issue might be unintended conversions. When annotated on the right side, no attempt at conversion is made. Instead, an error will be thrown immediately if the annotation does not match the data (as suggested, you might need to add a <: or reconsider your data structure). This will prevent such accidents.

If your data actually looks like this (with each index always having the same type - e.g., 4 Int64 and then a Matrix{Int64}), you might consider using a Tuple or custom struct instead of Vector{Union{Int64, Matrix{Int64}}}. For example,

julia> path_info0=[(rand(1:10),rand(1:10),rand(1:10),rand(1:10),rand(Int,2,2)) for i in 1:1000000]
1000000-element Vector{Tuple{Int64, Int64, Int64, Int64, Matrix{Int64}}}
...

has no type uncertainty in the first place.

Thank you very much, tuples are indeed much faster, but if it is a vector, setting the type on the right side will still cause an error:

function f(path_info)
trace_iteration = []

count=0
for k in 1:length(path_info)
p_path=path_info[k]::Vector{Union{Int64, Matrix{Int64}}}
t1 = p_path[1]::Int64
t2 = p_path[2]::Int64
if t1 < 5 && t2 <5
count += 1
end
end
push!(trace_iteration, count)

return trace_iteration
end

error:

 p_path=path_info[k]::Vector{Union{Int64, Matrix{Int64}}} 

f(path_info0)
ERROR: TypeError: in typeassert, expected Vector{Union{Int64, Matrix{Int64}}}, got a value of type Vector{Any}
Stacktrace:
[1] f(path_info::Vector{Vector{Any}})

error2:

 p_path=path_info[k]::Vector{<:Union{Int64, Matrix{Int64}}}

f(path_info0)
ERROR: TypeError: in typeassert, expected Vector{<:Union{Int64, Matrix{Int64}}}, got a value of type Vector{Any}
Stacktrace:
 [1] f(path_info::Vector{Vector{Any}})

Your problems start before your before your function. The argument you are passing in is a Vector{Vector{Any}}.

Continuing to use Vector{Any} within your function is faster because it does not require conversion and allocation of a new array.

The errors are occurring because you are using assertions on the right hand side, but the variables you are asserting to be of type Vector{<:Union{Int64, Matrix{Int64}}} are not of that type.

To start fixing this properly we need to change the type of your input argument path_info. Could you show us how you construct this?

Also looking at what I can see within your function, this is not good:

That starts trace_iteration as a Vector{Any}. From what I can tell you probably want

trace_iteration = Int[]

The types of the two declarations are illustrated here:

julia> typeof([])
Vector{Any} (alias for Array{Any, 1})

julia> typeof(Int[])
Vector{Int64} (alias for Array{Int64, 1})

julia> push!([], 3.0)
1-element Vector{Any}:
 3.0

julia> push!(Int[], 3.0)
1-element Vector{Int64}:
 3

From what what I can tell your path_info should probably be Vector{Vector{Int}} not a Vector{Vector{Any}}.

To convert your path_info from Vector{Vector{Any}} to Vector{Vector{Int}} do the following.

julia> path_info = [Any[1, 2, 3], Any[4, 5, 6]]
2-element Vector{Vector{Any}}:
 [1, 2, 3]
 [4, 5, 6]

julia> path_info = map(path_info) do p_path
           Int.(p_path)
       end
2-element Vector{Vector{Int64}}:
 [1, 2, 3]
 [4, 5, 6]

Normally, there is never any speed gain in declaring the types of variables. The exception is if you have type instability in your program, as you have. I.e. you have a variable path_info of type Vector{Vector{Any}}, so that the compiler can’t know the type of t1 and t2. The priority for performant programs is to get rid of type instabilities in critical loops.

If you for some reason can’t get rid of the type instability, but you anyway know the type of the vector elements, the simplest is to assert the type like t1 = p_path[1]::Int. This will throw an error if it’s not an Int.

Depending on your actual data and computations, there may be other options. E.g. a function barrier which will incur a single dynamic dispatch, but otherwise will be type stable.

1 Like

I would recommend removing the assertions entirely, just leave this to the compiler.

And definitely, as @mkitti says, fix this part:

If your code ever has [], it’s a red a flag for performance.

2 Likes

Thank you very much. It is indeed necessary to stabilize the variable type at the beginning. If it is a mixed-type vector, tuples need to be used to solve the mixed-type problem. I have solved the corresponding performance problem.

Thank you very much. It is indeed necessary to stabilize the variable type at the beginning. If it is a mixed-type vector, tuples need to be used to solve the mixed-type problem. I have solved the corresponding performance problem.

1 Like

As others have said, the difference is probably the copying from the conversions every time you assign to an annotated variable. convert doesn’t always make a copy, it depends on the implementation. convert(Any, x) just returns the same instance, and generally convert(T, x::T) will too.

Here’s a simple way to check, how do the timings change when you change

path_info0=Vector{Union{Int64, Matrix{Int64}}}[[rand(1:10),rand(1:10),rand(1:10),rand(1:10),rand(Int,2,2)] for i in 1:1000000]

so convert won’t need to do anything in f2?

I don’t think abstract types are generally implemented for convert unless the instance is already of the type because the choice of the concrete type would be ambiguous and arbitrary otherwise.