Purpose of zero() in a function

question

#1

I do not know what to write in the title…

I am learning a lot by analyzing source codes in Julia packages I am using.
Today, I saw an interesting line in lm.jl in the GLM package.

function deviance(r::LmResp)

    y = r.y

    mu = r.mu

    wts = r.wts

    v = zero(eltype(y)) + zero(eltype(y)) * zero(eltype(wts))

    if isempty(wts)

        @inbounds @simd for i = eachindex(y,mu)

            v += abs2(y[i] - mu[i])

        end

    else

        @inbounds @simd for i = eachindex(y,mu,wts)

            v += abs2(y[i] - mu[i])*wts[i]

        end

    end

    v

end

My question is in the fifth line. Why did the authors write such a line instead of v = 0 or v = 0.?
Is it for aiding type inference? Then, can it be replaced with v = zero(eltype(y))*zero(eltype(wts))?


#2

Yes, for type inference. It’s to make sure that v does not have to change its type during the subsequent loops as that would make it much slower. The same applies to why wts and y feature in the initialization.


#3

Thank you for your answer… but still somewhat confusing.
Then, where is zero(eltype(mu)) in this function? In the loops below, elements in mu are used to calculate v, but in the initialization, zero(eltype(y)) appears twice.


#4

Good point. That is probably a bug, although if y and mu have the same eltype then it will not show. Why don’t you do a PR to fix it. If there is indeed a reason for it, then they will tell you there (please report back).


#5

That is probably just nonsense in GLM.jl. I’d guess zero(eltype(r.y)) would be fine here: It is probably necessary to assume that the underlying AbstractFloat is closed under addition and multiplication and abs2 anyway, and in case of heap-allocated big floats there is no reason to cause the 5 extra allocs. Not all packages are written in a perfectly idiomatic way.


#6

x-ref https://github.com/JuliaStats/GLM.jl/issues/268.

The 5 extra allocs of BigFloat are the least of your concerns if you just fitted a model on BigFloat data and are about to compute the deviance in BigFloat.


#7

Easy with the shade. Note that you could have just as easily armchair-quarterbacked a GLM.jl that had your proposed zero(eltype(r.y)) instead if someone had come on discourse because it was slow with a non-closed numeric type. It’s hard to account for the quirks of every numeric type that folks will use in packages — so often authors try to strike a balance. It’s a pain in the butt to manually unroll the first two iterations of a loop, so these kinds of “good-enough” shortcuts are common and totally fine IMO (until, of course, its shown to be a significant problem, but Milan’s point is a good one about this being a drop in an ocean).


#8

I know that the interface for <:Real isn’t even informally specified, but being closed under + (and *) is a reasonable expectation I guess. Is there a type which violates this?


#9

Unitful quantities change types upon multiplication. Not sure if the rest of this algorithm is unitful-safe, though.


#10

I’m not saying that GLM.jl is bad, or that I could do better. I apologize if I created that impression.

I’m just saying that “written more convoluted than it needs to be” is often a realistic explanation when wondering why something in a package works the way it does. Especially if there are no source comments explaining why something is not done in the obvious way.

Re types, we know from the signature that r.y, r.mu and r.wts are of the same type, which is <:AbstractArray{<:AbstractFloat}. Originally I wondered about type-stability for complex values (because abs2 changes the type then), but these are excluded by the signature anyway; so my conclusion was “probably more complicated than it needs to be; otherwise someone should have written a source comment”.