Faster zeros with calloc

Sukera · October 16, 2021, 8:14pm

I feel like interacting with the posts and arguments presented in here would be a better mode of communication, as just plainly copy & pasting an argument presented on another platform feels bad, since I now feel like I have to copy & paste my answer to your answer again… Instead I’ll write out a fully formed, new post, as done in the OP.

Personally, I dislike the culture of “changing the benchmark until its correct” precisely because it feels very bad when you’re on the receiving end of it. I have been guilty of this in the past as well and as such, I nowadays try to explain why the presented benchmark gives the observed result without implying they’re doing it wrong unless explicitly asked for. This can either happen in the post directly (“am I missing something/doing it wrong? what’s going on?”) or in a follow up post (“Ah, I see! How could this be benchmarked better?”).

I’m not sure I follow here - are you referring to julia’s stdlib with “default language […] functionality”? From my POV, changing zeros to be implicitly lazy (by using calloc) would precisely introduce such an inconsistency. Across the board, laziness and eagerness are strictly seperated, often by living in the Base.Iterators submodule. Making zeros lazy but having ones be eager feels just as arbitrary - why is one lazy and not the other? The answer would be “because there’s no equivalent to calloc for ones”.

I don’t like the argument about the “pit of success” here for two reasons

it’s very subjective
it implies there’s only “one true way”

The “pit of success” is different for every use case. What’s ideal for one program and developer is an annoyance for another - making a blanket statement about what ought to be done is what feels un-julian to me, not the discussion about “should we expose calloc at all”. I’ve reiterated that point both on zulip and up above - I’m not opposed to calloc!

I don’t understand why this should be an either/or case? I’m arguing that we should have BOTH, but default to the former because making code behave in a way that makes it easier to reason about is a good thing. When you write zeros(Float64, ...), write to some locations and pass that array on to another piece of code that you may or may not have written, you don’t necessarily control how the resulting array is used. Conversely, when you use zeros in library code you have no idea how a user/receiver of that array will use it. Both as a user of a library as well as a developer of a library I’d like to have an easier time following where time is spent and why. With a lazy zeros using calloc, this is not possible - you do not necessarily know when or how that array is used/accessed. The performance cost of calloc can manifest itself whenever any element that happens to be on a new page is accessed, which may not be close to the allocation site at all. How would you identify that this was spent as a zeroing page fault as a result of calloc at all? The information that this is happening is not even present in a strace of system calls. All you can easily observe is that some loop over some indices somewhere in the stack is slower than you expect, and figuring out why is a gargantuan task. Having to pay the cost up front makes it trivial - a flamegraph identifies the aggregate time spent in the zeros call.

Note that I’m not saying that we shouldn’t use calloc ever - that is not the case! What I am saying is that this should only be used when the performance implications about how the cost is distributed are understood and accepted, hence the default for eager initialization.

My complaint would be the same - calloc is intransparent when it comes to where the cost of initialization moves to and thus should be avoided, except in cases where it’s understood to be ok. Naive uses of different allocation schemes (fill(zero(T), ..) vs. a naive zeros(T, ...)) should not have an impact on the performance of a loop in a completely different part of the stack.

If this is really the kind of answer you expect on this forum, I’m extremely saddened that this is the impression the julia community gives off. All of these sound extremely dismissive and, frankly, come across as “get off my lawn” instead of embracing others’ POV. I for one would be happy if people were to implement crypto (and maybe prove correctness?) in julia. I also want julia to succeed in areas other than ML, AD, DiffEq, Modeling and similar.

This feels like a strawman - I’m equally missing tools for investigating the compiler and what it’s doing in these cases. I’m not sure why that would be an argument against having a choice between lazy and eager zeros.

I don’t like just pointing to some implementation somewhere and saying “look, X is doing it too” anymore when the comparison is python or rust. Rust conveniently gives you the choice to use an allocator that doesn’t use calloc but instead always uses malloc - a choice neither julia nor python expose, as the GC is treated as blackbox, not to be meddled with by the user.

Reading through the source of Vec (Rusts’ equivalent to julia vectors), I couldn’t help but notice that growing a vector doesn’t guarantee anything about whether or not the new memory is zeroed or not. In contrast, julia already documents that resize! with a larger size does not initialize the resulting memory. How should this be dealt with should a zeros array be allocated via calloc? As far as I’m aware, there is no realloc for calloced memory.

I don’t get what’s specific about zeros that this blanket statement should have weight here. This can be said about any difference between julia and any other language.
A few points:
a) How do off-by-one errors factor into this discussion at all?
b) I don’t see how having false as a sense of strong zero is relevant to whether or not zeros is lazy or eager. If you get something that wasn’t 0, by definition, that array wasn’t zeroed - no matter the initialization method.
c) I still don’t understand how that line of code is a strong argument either way. Having false as a strong zero not working for user defined types is, from my POV, not a problem of Flux or DiffEq or Zygote, but of the user defined type implementing multiplication wrong. That this is very loosely and badly (or not at all) defined is also bad, but has nothing to do with whether zeros is eager or lazy. In fact, for user defined types zeros has to use malloc with initialization because a zero-filled block of memory may not be valid for zero of a user defined type! As you note, it may be boxed or a number of other things, like not being isbits or simply not having a block of zeros as its zero representation.
That’s a single use case which I really hope is not the exclusive intended use case for julia as a whole.

I don’t think the points you put in a list support either lazy or eager zeros or even the ability for the user to choose. To me it just comes across as “I want this because it’s helpful in my case” to which I say “ok, let’s have a user controlled switch then since users can’t affect the memory allocator meaningfully here”.

I haven’t seen anyone claim “zeros shouldn’t exist” yet, so I’m not quite sure what you mean here.

The trouble with that is that it’s not as simple as switching between calloc and malloc behind the scenes, especially not for user defined types (for which zeros also has to work). For example, suppose we’d add a switch between lazy and eager initialization, do we then provide a LazyInitializedMatrix type that gets returned for custom user defined types that don’t neatly map to calloc? The page faulting process is completely invisible to julia user code as far as I know, so we can’t even hook into it to lazily initialize user types.

Adding onto this that this would be (as far as I know) the only place in julia where user code could explicitly influence how something is allocated, my gut tells me “let’s be careful what we want here and what the implications are”.

Topic		Replies	Views
Fastest way of getting a long zero vector? General Usage	29	856	March 15, 2024
[ANN] ArrayAllocators.jl v0.3 composes with OffsetArrays.jl v1.12.1+ for faster zeros with offset indexing Package Announcements announcement	4	896	June 30, 2022
What about an `undefs` function in `Base`? General Usage feature-request	64	3150	January 24, 2023
1.0 annoyances and Matlab comparison Internals & Design	148	13009	June 22, 2018
Performance of zeros() vs. Array{T}()? Performance	4	1167	September 6, 2018

Faster zeros with calloc

Related topics