Type Definition of Array Arguments to Functions

freestatelabs · September 13, 2024, 1:18pm

Hi, for a project I’m working on, I defaulted to defining my functions with input arrays with data types as Float64, i.e.

function foo(vec::Vector{Float64})
    # do stuff 
end

However, I realized later that this was a mistake, as sometimes I’d have input arrays of Integers. So, thinking that I could use a supertype here, I tried defining my functions like this:

function bar(vec::Vector{Number})
   # Do the same stuff 
end

However, I get this error:

julia> vec = [1,2,3]
3-element Vector{Int64}:
 1
 2
 3

julia> bar(vec)
ERROR: MethodError: no method matching bar(::Vector{Int64})

Closest candidates are:
  bar(::Matrix{Number})

Since Int64 is a subtype of Number, shouldn’t this work just fine? Are there best practices I should be aware of when defining input types for function arguments?

DanielVandH · September 13, 2024, 1:23pm

Use <:Number

julia> function foo(a::Vector{Number}) end;

julia> foo([1.0, 2.0, 3.0])
ERROR: MethodError: no method matching foo(::Vector{Float64})

Closest candidates are:
  foo(::Vector{Number})
   @ Main REPL[7]:1

Stacktrace:
 [1] top-level scope
   @ REPL[8]:1

julia> function bar(a::Vector{<:Number}) end;

julia> bar([1.0, 2.0, 3.0]) # worked

screw_dog · September 13, 2024, 1:27pm

Just want to add that over specifying type information is somewhat of an anti-pattern in Julia, especially outside of library code.

It is only necessary to specify types at all if you are taking advantage of dispatch ie you want to have multiple methods for the same function that work differently for different types. I would caution against adding any type information unless actually necessary

nsajko · September 13, 2024, 1:29pm

This is covered in the manual here.

freestatelabs · September 13, 2024, 2:13pm

I think my question is a great example of why it’s an anti-pattern.

This holds true for function definitions, right? But I’m getting confused with the overlap with type definitions. For example, if I define:

abstract type Data end
struct DataA <: Data vec::Vector end 
struct DataB <: Data vec::Vector{Number} end

and then a series of contrived test functions, one of which operates on both structs, and the other two are subtype-specific:

function test_data(data::Data) 
    for i in eachindex(data.vec)
        a = data.vec[i]   
    end
end

function test_dataA(data::DataA)
    for i in eachindex(data.vec)
        a = data.vec[i]   
    end
end

function test_dataB(data::DataB)
    for i in eachindex(data.vec)
        b = data.vec[i]   
    end
end

Every usage of DataA (the one that does not specify a composite type for vec::Vector) results in memory allocations (why 1489 in this case, and not 1000?) whereas vec::Vector{Number} does not:

A = DataA(rand(1000))
B = DataB(rand(1000))
using BenchmarkTools

julia> @btime test_data($A)
  15.916 μs (1489 allocations: 23.27 KiB)

julia> @btime test_data($B)
  321.708 ns (0 allocations: 0 bytes)

julia> @btime test_dataA($A)
  16.166 μs (1489 allocations: 23.27 KiB)

julia> @btime test_dataB($B)
  321.659 ns (0 allocations: 0 bytes)

Thank you!

nsajko · September 13, 2024, 2:21pm

Covered in the manual here.

nilshg · September 13, 2024, 2:23pm

Just to be clear you want something like

struct DataC{T} where T <: Data
    vec::Vector{T}
end

which makes the following difference for type inference:

julia> @code_warntype test_data(A)
MethodInstance for test_data(::DataA)
  from test_data(data::Data) @ Main REPL[5]:1
Arguments
  #self#::Core.Const(test_data)
  data::DataA
Locals
  @_3::Union{Nothing, Tuple{Int64, Int64}}
  i::Int64
  a::Any
(...)

julia> C = DataC(rand(1000));

julia> @code_warntype test_data(C)
MethodInstance for test_data(::DataC2{Float64})
  from test_data(data::Data) @ Main REPL[5]:1
Arguments
  #self#::Core.Const(test_data)
  data::DataC2{Float64}
Locals
  @_3::Union{Nothing, Tuple{Int64, Int64}}
  i::Int64
  a::Float64

screw_dog · September 13, 2024, 2:25pm

It is usually advised to have concrete types in structs - and to use type parameters if necessary to achieve this. In this example both are actually abstract types (since Number is abstract) but it probably allows some inference?

But honestly, I feel like this is probably premature optimization. I would suggest simply defining functions like eachindex and getindex on your types and writing your functions generically. Then implementation details of these types can be changed later when performance issues can actually be measured.

(I feel like the Julia community is so used to thinking about performance that we tend to reach for performant practices early in development long before any sensible profiling can be done. But that’s just my ranty opinion so take with a grain of salt)

freestatelabs · September 13, 2024, 2:36pm

Thanks for the detailed explanation here.

The goal of my project is to see how fast I can get a particular type of calculation to run - i.e. the innovation is speed, not writing something that doesn’t exist yet. In doing so, I’ve spent a lot of time trying to teach myself how to write more performant Julia code, which has resulted in…my project forever being incomplete.

So I agree with your take on this - first write something sensible that runs, and then optimize later once you have good measurements on its end-use/real-world performance.