I’m curious about the overhead Vs. benefits of using types. I did see mention of a related subject in julia docs but I think my question is slightly different. It seems reasonable to believe that there is some sort of overhead to typing.
So basically, I am interested in being able to determine when the benefit of typing data is outweighed by the cost?
Just to create some context I will invent an example. Lets say I have (1)many arrays which I would like to treat differently depending on type and then (2) I will pack lots of those arrays into some container. Something like this:
struct Thing1{T}
mydata::Array{T}
end
struct Thing2{T}
mydata::Array{T}
end
struct Thing3{T}
mydata::Array{T}
end
struct Thing4{T}
mydata::Array{T}
end
struct Contaner
things::Array
end
On the other hand could associate a unique value with each distinct type of array and store that as a psudo-type like so:
struct Thing{T}
mydata::Array{T}
myPsudoType::Int8
end
struct Contaner
things::Array{Thing}
end
If I am going to be working with a large data set this could be important. Any thought on the matter?
Note that this question is not really about the overhead of “types” in general, but rather about the overhead of using non-concrete types, i.e. of runtime dynamic dispatch.
It depends on what you’re doing with your arrays. Since you have an array of arrays, I’m guessing you might be doing relatively expensive computations on the elements, in which case the overhead of dynamic dispatch might not be significant.
(I’m guessing you meant to write “pseudo-type”, but I love the term “spudo-type”. )
(As @yuyichao noted, if you want a concrete type you need e.g. Vector{T} or equivalently Array{T,1}.)
There are a number of downsides to encoding a property of your thing in the type system:
There will be more time spent in compilation — every time a different type hits a different function, Julia will compile a new specialized version just for that type.
If this property is typically determined by a run-time value, then many functions working with your things will be type-unstable. Type-unstable variables can stymie optimizations, cause allocations and hit dynamic dispatch.
Collections of many things with different properties are type-unstable when you try to pull out and work with individual elements.
You cannot mutate that property in-place.
On the plus column are effectively the flip-sides of exactly the same things:
Julia will compile a new specialized version just for that type. If this property drastically changes how you work with your things, this can be a huge gain.
If the value of the property changes the return types of functions that work with your things, encoding it into the type system will make those functions type-stable. It actually can make things more type-stable. This means that downstream analyses can do even better optimizations.
Collections of many things with the same property can be highly optimized. That property value gets encoded once into the type of the array or collection (and not many times on each individual element), so it can more compactly represent the elements.
You cannot mutate that property in-place.
How these things interplay will depend on what exactly you’re doing. Note that “adding types” doesn’t inherently add overhead. It can also remove overhead.