I feel like interacting with the posts and arguments presented in here would be a better mode of communication, as just plainly copy & pasting an argument presented on another platform feels bad, since I now feel like I have to copy & paste my answer to your answer again… Instead I’ll write out a fully formed, new post, as done in the OP.
Personally, I dislike the culture of “changing the benchmark until its correct” precisely because it feels very bad when you’re on the receiving end of it. I have been guilty of this in the past as well and as such, I nowadays try to explain why the presented benchmark gives the observed result without implying they’re doing it wrong unless explicitly asked for. This can either happen in the post directly (“am I missing something/doing it wrong? what’s going on?”) or in a follow up post (“Ah, I see! How could this be benchmarked better?”).
I’m not sure I follow here - are you referring to julia’s stdlib with “default language […] functionality”? From my POV, changing zeros
to be implicitly lazy (by using calloc
) would precisely introduce such an inconsistency. Across the board, laziness and eagerness are strictly seperated, often by living in the Base.Iterators
submodule. Making zeros
lazy but having ones
be eager feels just as arbitrary - why is one lazy and not the other? The answer would be “because there’s no equivalent to calloc
for ones
”.
I don’t like the argument about the “pit of success” here for two reasons
- it’s very subjective
- it implies there’s only “one true way”
The “pit of success” is different for every use case. What’s ideal for one program and developer is an annoyance for another - making a blanket statement about what ought to be done is what feels un-julian to me, not the discussion about “should we expose calloc
at all”. I’ve reiterated that point both on zulip and up above - I’m not opposed to calloc
!
I don’t understand why this should be an either/or case? I’m arguing that we should have BOTH, but default to the former because making code behave in a way that makes it easier to reason about is a good thing. When you write zeros(Float64, ...)
, write to some locations and pass that array on to another piece of code that you may or may not have written, you don’t necessarily control how the resulting array is used. Conversely, when you use zeros
in library code you have no idea how a user/receiver of that array will use it. Both as a user of a library as well as a developer of a library I’d like to have an easier time following where time is spent and why. With a lazy zeros
using calloc
, this is not possible - you do not necessarily know when or how that array is used/accessed. The performance cost of calloc
can manifest itself whenever any element that happens to be on a new page is accessed, which may not be close to the allocation site at all. How would you identify that this was spent as a zeroing page fault as a result of calloc
at all? The information that this is happening is not even present in a strace
of system calls. All you can easily observe is that some loop over some indices somewhere in the stack is slower than you expect, and figuring out why is a gargantuan task. Having to pay the cost up front makes it trivial - a flamegraph identifies the aggregate time spent in the zeros
call.
Note that I’m not saying that we shouldn’t use calloc
ever - that is not the case! What I am saying is that this should only be used when the performance implications about how the cost is distributed are understood and accepted, hence the default for eager initialization.
My complaint would be the same - calloc
is intransparent when it comes to where the cost of initialization moves to and thus should be avoided, except in cases where it’s understood to be ok. Naive uses of different allocation schemes (fill(zero(T), ..)
vs. a naive zeros(T, ...)
) should not have an impact on the performance of a loop in a completely different part of the stack.
If this is really the kind of answer you expect on this forum, I’m extremely saddened that this is the impression the julia community gives off. All of these sound extremely dismissive and, frankly, come across as “get off my lawn” instead of embracing others’ POV. I for one would be happy if people were to implement crypto (and maybe prove correctness?) in julia. I also want julia to succeed in areas other than ML, AD, DiffEq, Modeling and similar.
This feels like a strawman - I’m equally missing tools for investigating the compiler and what it’s doing in these cases. I’m not sure why that would be an argument against having a choice between lazy and eager zeros
.
I don’t like just pointing to some implementation somewhere and saying “look, X is doing it too” anymore when the comparison is python or rust. Rust conveniently gives you the choice to use an allocator that doesn’t use calloc
but instead always uses malloc
- a choice neither julia nor python expose, as the GC is treated as blackbox, not to be meddled with by the user.
Reading through the source of Vec
(Rusts’ equivalent to julia vectors), I couldn’t help but notice that growing a vector doesn’t guarantee anything about whether or not the new memory is zeroed or not. In contrast, julia already documents that resize!
with a larger size does not initialize the resulting memory. How should this be dealt with should a zeros
array be allocated via calloc
? As far as I’m aware, there is no realloc
for calloc
ed memory.
- I don’t get what’s specific about
zeros
that this blanket statement should have weight here. This can be said about any difference between julia and any other language. - A few points:
a) How do off-by-one errors factor into this discussion at all?
b) I don’t see how havingfalse
as a sense of strong zero is relevant to whether or notzeros
is lazy or eager. If you get something that wasn’t 0, by definition, that array wasn’t zeroed - no matter the initialization method.
c) I still don’t understand how that line of code is a strong argument either way. Havingfalse
as a strong zero not working for user defined types is, from my POV, not a problem of Flux or DiffEq or Zygote, but of the user defined type implementing multiplication wrong. That this is very loosely and badly (or not at all) defined is also bad, but has nothing to do with whetherzeros
is eager or lazy. In fact, for user defined typeszeros
has to usemalloc
with initialization because a zero-filled block of memory may not be valid forzero
of a user defined type! As you note, it may be boxed or a number of other things, like not beingisbits
or simply not having a block of zeros as itszero
representation. - That’s a single use case which I really hope is not the exclusive intended use case for julia as a whole.
I don’t think the points you put in a list support either lazy or eager zeros
or even the ability for the user to choose. To me it just comes across as “I want this because it’s helpful in my case” to which I say “ok, let’s have a user controlled switch then since users can’t affect the memory allocator meaningfully here”.
I haven’t seen anyone claim “zeros
shouldn’t exist” yet, so I’m not quite sure what you mean here.
The trouble with that is that it’s not as simple as switching between calloc
and malloc
behind the scenes, especially not for user defined types (for which zeros
also has to work). For example, suppose we’d add a switch between lazy and eager initialization, do we then provide a LazyInitializedMatrix
type that gets returned for custom user defined types that don’t neatly map to calloc
? The page faulting process is completely invisible to julia user code as far as I know, so we can’t even hook into it to lazily initialize user types.
Adding onto this that this would be (as far as I know) the only place in julia where user code could explicitly influence how something is allocated, my gut tells me “let’s be careful what we want here and what the implications are”.