Intermediate results (dereferencing, CSE, ..) - when to avoid?

A habit tells me to eliminate common subexpressions and to create intermediate variables.
On the other hand, Julia’s performance is often judged by its allocations and so seems to suggest avoiding intermediate variables…

Take

    for ff = 1:nc
        write(io, UInt8(length(String(fieldnames(typeof(first(v)))[ff]))))
        write(io, String(fieldnames(typeof(first(v)))[ff]))
        ser(io, getfield.(v, fieldnames(typeof(first(v)))[ff]))
    end

vs.

for ff = 1:nc
    fieldName = fieldnames(typeof(first(v)))[ff]
    write(io, UInt8(length(String(fieldName))))
    write(io, String(fieldName))
    ser(io, getfield.(v, fieldName))
end

vs.

fieldNames = fieldnames(typeof(first(v)))
for ff = 1:nc
    write(io, UInt8(length(String(fieldNames[ff]))))
    write(io, String(fieldNames[ff]))
    ser(io, getfield.(v, fieldNames[ff]))
end

Is it reasonable to assume a threshold like the effort for 3x calculation justifies intermediate storage?

Now you are just creating the intermediate storage three times, so it’s less readable, requires more calculation, and more allocations.

In this case definitely create the intermediate.

Sometimes the compiler will eliminate CSEs automatically, but you can’t necessarily rely on that.

2 Likes

I think there’s some misconception here. The intermediate is still allocated, even if you don’t assign a name to it. The object

fieldnames(typeof(first(v)))

is created and allocated, not naming it doesn’t help, it will just be assigned some internal label instead.

In cases where the intermediate is just some value, that does not need to be heap allocated, the allocations don’t matter either way, and you can create temporary variables to your heart’s content. But if the intermediate is an array, for example, then you get a real, and possibly expensive allocation over and over.

So in your case it’s not a trade-off, it’s just worse in every way to not assign the intermediate.

2 Likes

Thx for the comment, clearer now!
The heap is probably needed for larger intermediate results, so version 3 should be the preferred one?

Version 3 looks best. Now, after checking, I see that fieldnames returns a tuple and not an array, which means that allocations isn’t necessarily a problem here.

But, as I mentioned, it’s not a trade-off the way you were thinking. Either the allocations don’t matter, so you can create an intermediate without cost, or the allocations do matter, in which case you should create an intermediate variable if you are using it repeatedly.

There are cases where not creating an intermediate can be beneficial, but that is if you can work directly on the original object and avoid allocation completely. But this is unrelated to whether you assign a name to it or not.

2 Likes

Or maybe this is cleaner

for name in fieldnames(typeof(first(v)))
    write(io, UInt8(length(String(name))))
    write(io, String(name))
    ser(io, getfield.(v, name))
end

It does the same thing, but for val in iter is normally nicer than for i in 1:length(iter) plus indexing.

(You actually do conversion from symbol to string twice, but it’s no big deal).

1 Like

To expand on this: because of referential transparency,

f(g(x))

and

y = g(x)
f(y)

are equivalent in Julia. The language does not care, y is merely a label for the programmer.

1 Like