# Are objects costly?

Here is a case where I could use

either a function or a struct/type:

``````function bounded(val::Real, lo::Real, hi::Real)
@assert isfinite(lo) && isfinite(hi) && isfinite(val)
lo <= val <= hi
end
struct Bounds{T <: Real}
lo::T
hi::T
Bounds(lo::T, hi::T) where {T} = (@assert isfinite(lo) && isfinite(hi); new{T}(lo, hi))
end
(b::Bounds)(val::Real) = (@assert isfinite(val); b.lo <= val <= b.hi)

@assert Bounds(0, 1)(0.5)
@assert bounded(0.5, 0, 1)
``````

I understand that if I am checking lots of values against the same bounds, a struct/type might make sense. What if thatâ€™s not the case?

Hereâ€™s another case where I can define a vector of objects or object containing a vector.

``````struct OnlyZero{T <: Real}
x::T
OnlyZero(x::T) where {T} = (@assert iszero(x); new{T}(x))
end
struct OnlyZeros{T <: Real}
x::Vector{T}
OnlyZeros(x::Vector{T}) where {T} = (@assert all(iszero.(x)); new{T}(x)) # Can do `iszero(x)` instead of `all(iszero.(x))` in new versions of Julia
end
``````

one is safer than the other in the following sense:

``````veczeros = [OnlyZero(0 - 0), OnlyZero(1 - 1)]
# veczeros[1] = OnlyZero(0 - 1) # errors. safe.

vec_zeros = OnlyZeros([(0 - 0), (1 - 1)])
vec_zeros.x[1] = 0 - 1 # does not error. unsafe.
``````

I like the idea of making structs constructors do validation. But are objects costly to create? Are they more/less costly in Julia than in other languages?

No, not if they are immutable with concretely typed members, as is the case for your `Bounds` object. They need not be heap-allocated in that case (or necessarily even stack-allocated â€¦ the compiler can even put the members into registers for a local `struct`).

5 Likes

Aside: Note that you can simply use `iszero(x)` for a vector (since vector spaces have a zero element, by definition); this isnâ€™t just in â€śnewâ€ť versions of Julia â€” it dates back well before Julia 1.0. Even if you wanted to use `all`, it should be more efficient to use `all(iszero, x)`, to avoid allocating the intermediate array `iszero.(x)` of boolean values.

Itâ€™s not clear to me what this assertion is for; `b.lo <= val <= b.hi` will already return `false` for non-finite `val` if `b.lo` and `b.hi` are finite. And why wouldnâ€™t you want to support infinite bounds, like `Bounds(0.0, Inf)`?

Also, generally you want to throw an `ArgumentError` (or a more specific exception) for invalid arguments that might come from an external source (e.g. a user); `@assert` should generally only be used to test for conditions that should be impossible, based on the internal logic of a function/module. See e.g. this stackexchange discussion.

4 Likes

Thanks @stevengj.

On `iszero`: noted. I was misremembering.

On allowing `Inf` in bounds: I was just trying to create an example. There are cases where `Inf` should just not be possible in the data, and this is what I was thinking of. But yes, in general, you might want to allow `Inf`s.

On `AssertionError` vs `ArgumentError`: What Iâ€™m taking away is that `AssertionError`s are to be thought of as warnings for developers when there is a flaw in logic. `ArgumentError` is to warn users when they have supplied wrong arguments-- when itâ€™s the userâ€™s fault, not the softwareâ€™s. Makes sense. If we imagine all my examples were â€śinternalâ€ť, the `@asserts` still make sense. But let me know if Iâ€™m missing something.

Coming back to the topic:
Objects when created must be storing some metadata about their type. If we have an array of objects, we probably store the metadata in one place for everything in the array. What if weâ€™re creating lots of objects and theyâ€™re not in an array? Does the metadata not pile up?

Itâ€™s no different from creating an `Int` or a `Float64`. If you do

``````function mysum(someiterator)
s = zero(eltype(someiterator))
for x in someiterator
s += x
end
return s
end
``````

and call `mysum(rand(10^6))`, you donâ€™t worry about â€ścreatingâ€ť a million integers from each iteration of the loop.

When they are stored in a heap-allocated `Array`, indeed an `Array{T}` instance stores a reference to the the element type `T` once. Even if `T` is an immutable `struct`, then the data for the structs is typically stored â€śinlineâ€ť in the array memory, one member after another, with no per-element type tag.

However, if you have objects stored as local variables, the compiler generates code that is specialized for data of that type â€” it doesnâ€™t need to explicitly store a reference to the type at all in the generated code. e.g. If you have `x + y` where `x` and `y` are `Int32` instances, the compiler stores the result in a 32-bit integer register (or spills it to the stack), but doesnâ€™t need to explicitly point to the `Int32` type. In the example above, if you call `mysum` on a `Float64` array, then the `s` variable is stored in a `Float64` register and is updated in-place to add the value of each element.

The key thing to remember is that, whenever Julia compiles a function, it specializes the compiled code to the types of the arguments, and hence for the types inferred for any variables within the function.

(The worse case is if you have an untyped container, e.g. an `Any[...]` array. In this case, each element in the container must be stored as a pointer to a heap-allocated â€śboxâ€ť that has the value and a pointer to the runtime type. But this is equally true of `Any[1,2,3]`. Similarly, if you moved the body of `mysum` to global scope, so that `s` is a global variable, then the language assumes that it can change type at any time so it is stored in a â€śboxâ€ť on the heap, and a new â€śboxâ€ť is allocated for every iteration when `s` is updated.)

2 Likes

An example to keep in mind is that a complex number in Julia is simply a `struct`:

``````struct Complex{T<:Real} <: Number
re::T
im::T
end
``````

If creating such a `struct` were costly, then complex-number arithmetic in Julia would be insanely expensive.

Early in Juliaâ€™s history, before immutable types were implemented (julia#13), complex numbers had to be implemented as a `primitive` type (then called `bitstype`) for performance â€” they were finally changed to `struct` for Julia 0.2, which was a huge step forward for the performance of user-defined types.

As @StefanKarpinski pointed out when immutable types were still in the planning stage:

I should point out that LLVM (like other compilers) is really good at optimizing away copies of immutable values. When you do something which, in principle, copies a Range object with one field modified, it will very often actually just be modified in place. Even more importantly, it is quite possible that the Range object doesnâ€™t actually live anywhere in memory â€“ there is no contiguous Range object allocated, but rather pieces of it can just be stored in registers and never saved to memory at all.

2 Likes

This is why the performance section of the manual talks about type stability. For good performance it is vital that inside a function, the type of every variable can be deduced from the types of the functionâ€™s actual arguments. Then the compiler knows all the types when compiling the function, they donâ€™t have to be stored and checked when the compiled code is run. If two float32 are to be added, the compiler will just emit a single instruction for adding float32s.

On the other hand, if the types are not known at compile time, e.g. if you do things like `a = (rand() < 0.5) ? 0 : 1.0`, where `a` will be either a `Float64` or an `Int64`, the type must be stored somewhere, and it must be checked when you use the `a`. It can be disastrous for performance.

1 Like