Hi, for a project I’m working on, I defaulted to defining my functions with input arrays with data types as Float64, i.e.
function foo(vec::Vector{Float64})
# do stuff
end
However, I realized later that this was a mistake, as sometimes I’d have input arrays of Integers. So, thinking that I could use a supertype here, I tried defining my functions like this:
function bar(vec::Vector{Number})
# Do the same stuff
end
Since Int64 is a subtype of Number, shouldn’t this work just fine? Are there best practices I should be aware of when defining input types for function arguments?
Just want to add that over specifying type information is somewhat of an anti-pattern in Julia, especially outside of library code.
It is only necessary to specify types at all if you are taking advantage of dispatch ie you want to have multiple methods for the same function that work differently for different types. I would caution against adding any type information unless actually necessary
I think my question is a great example of why it’s an anti-pattern.
This holds true for function definitions, right? But I’m getting confused with the overlap with type definitions. For example, if I define:
abstract type Data end
struct DataA <: Data vec::Vector end
struct DataB <: Data vec::Vector{Number} end
and then a series of contrived test functions, one of which operates on both structs, and the other two are subtype-specific:
function test_data(data::Data)
for i in eachindex(data.vec)
a = data.vec[i]
end
end
function test_dataA(data::DataA)
for i in eachindex(data.vec)
a = data.vec[i]
end
end
function test_dataB(data::DataB)
for i in eachindex(data.vec)
b = data.vec[i]
end
end
Every usage of DataA (the one that does not specify a composite type for vec::Vector) results in memory allocations (why 1489 in this case, and not 1000?) whereas vec::Vector{Number} does not:
A = DataA(rand(1000))
B = DataB(rand(1000))
using BenchmarkTools
It is usually advised to have concrete types in structs - and to use type parameters if necessary to achieve this. In this example both are actually abstract types (since Number is abstract) but it probably allows some inference?
But honestly, I feel like this is probably premature optimization. I would suggest simply defining functions like eachindex and getindex on your types and writing your functions generically. Then implementation details of these types can be changed later when performance issues can actually be measured.
(I feel like the Julia community is so used to thinking about performance that we tend to reach for performant practices early in development long before any sensible profiling can be done. But that’s just my ranty opinion so take with a grain of salt)
The goal of my project is to see how fast I can get a particular type of calculation to run - i.e. the innovation is speed, not writing something that doesn’t exist yet. In doing so, I’ve spent a lot of time trying to teach myself how to write more performant Julia code, which has resulted in…my project forever being incomplete.
So I agree with your take on this - first write something sensible that runs, and then optimize later once you have good measurements on its end-use/real-world performance.