1.0 annoyances and Matlab comparison

uninitialized forces my brain to go into “spelling” mode which is annoying. The iti part is what gets me, I think, since the pronunciation is more like “itchi”.

5 Likes

I agree that what I don’t like about “uninitialized” is its spelling, it takes me a few seconds to write it correctly without an autocompletion system.

How about using garbage instead?

I think the discussion has moved to https://github.com/JuliaLang/julia/pull/26316. I honestly can live with “uninitialized”, I just happen to dislike the word. I have to type it as “un-init-ial-ized”.

2 Likes

Or maybe a short, four-letter synonym… :thinking:

5 Likes

So many options…

I guess Stefan was referring to “junk” :wink:

Yea, I figured, but there’s also muck… I love dreck!

I personally don’t mind uninitialized (even though I definitely need autocomplete to spell this right), but if there is a strong request for a more concise syntax, maybe it would be possible to bring back the (very) old constructor Array(T, 3) to mean Array{T}(uninitialized, 3). It would be analogous to missings(T, 3) to create a Array{Union{Missing, T}} filled with missings.

Of course, as I don’t know what future constructor are planned for Array, this proposal only makes sense if dispatching on ::Type as first argument does not clash with anything else.

1 Like

I rather like the term ‘dreck’ myself. And ti is not too English-centric.
The Scots have a nice word for the fluff under the bed oose . But that is a very specific thing - it does not mean ‘rubbish’.

Just also noting that the era of non-volatile memory will be upon us soon. I guess the old timers with real core memory coped with that.
Will there be language extensions in future Julia version maybe which say “pick up where you left off with that array before the machine crashed”. But tthats goign far off topic.

I think the really hard thing to deal with programming when that happens is efficiently making sure that old contents aren’t left around, for security reasons.
It’s not fun having to do it with databases!

what about something like

Array{T}(vals, dim) where vals::Val{:uninitialized}

The nice thing is that when a user sees it for the first time and does

julia> typeof(vals)
Val{:uninitialized}

there is an automatic hook into the documentation and it becomes clear what is going on (… and avoids doing what I did a few posts up where I somehow thought uninitialized was a type of magic iterator … duh!). Also, I guess it could even be extended to something like

Array{T}(zeros, dim) where zeros::Val{:zeros}
Array{T}(ones, dim) where ones::Val{:ones}

now that zeros and ones are gone.

That’s another plus point for having an explicit first argument (possibly with shorter spelling than uninitialized): People could e.g. define something like:

struct Clear_on_free end
function Array{T}(::Clear_on_free, size...) where T
       rv = Array{T}(uninitialized, size...)
       finalizer(rv) do rv ccall(:bzero, Nothing, (Ptr{Nothing}, Cint), pointer(rv), sizeof(rv)) end
       rv
       end


A = Array{Int}(Clear_on_free(), 2, 2);
A[1]=-1; Aptr=pointer(A);
@show unsafe_load(Aptr);
#unsafe_load(Aptr) = -1

A= 2; 
@show unsafe_load(Aptr);
#unsafe_load(Aptr) = -1

GC.gc();
@show unsafe_load(Aptr);
#unsafe_load(Aptr) = 0

Warning: Don’t do this if for arrays that you resize during their lifetime. You need to subtract the offset from the pointer before calling bzero and clear up to the capacity. Also, you can still leak memory contents when resizing and I have no idea how to hook into the resize machinery.

1 Like

I like the technique, unfortunately, it doesn’t really give much in the way of security, because Julia is very lazy about picking up garbage lying around (just like my almost 15 year old son!)

2 Likes

Nothings wrong with it, there’s just no reason for it to be in Base. I’m sure there’ll quickly sprout up a ConvenientArrays.jl package.

I think new is a very valid choice, short and clear, and has been used by other languages to do the same thing. In C++, int* p = new int[n]; will do the same as p = Array{Int}( uninitialized, n). new is also used in Pascal and other languages with the same meaning. All of which will do simple allocation reserving a block of memory of a certain size without concern of its contents. What is wrong with using it in Julia instead of that mouthful uninitialized?

I’d also love to remind you of the Zen of Python, too many of these cool principles are already applied in Julia. I’ll copy these here for convenience, I hope this will help us take the right decision. As a performance language, Julia should never deprive us from writing readable code without any performance loss from forcing the user to initialize arrays.

The Zen of Python:

  1. Beautiful is better than ugly.
  2. Explicit is better than implicit.
  3. Simple is better than complex.
  4. Complex is better than complicated.
  5. Flat is better than nested.
  6. Sparse is better than dense.
  7. Readability counts.
  8. Special cases aren’t special enough to break the rules.
  9. Although practicality beats purity.
  10. Errors should never pass silently.
  11. Unless explicitly silenced.
  12. In the face of ambiguity, refuse the temptation to guess.
  13. There should be one—and preferably only one—obvious way to do it.
  14. Although that way may not be obvious at first unless you’re Dutch.
  15. Now is better than never.
  16. Although never is often better than right now.
  17. If the implementation is hard to explain, it’s a bad idea.
  18. If the implementation is easy to explain, it may be a good idea.
  19. Namespaces are one honking great idea—let’s do more of those!
2 Likes

Note that it isn’t always “garbage”: if it’s an array of non-bitstypes, then you’ll get undefs.

Could pls you show some example what do you mean?

julia> Vector{BigFloat}(uninitialized, 10)
10-element Array{BigFloat,1}:
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
1 Like

Depends. Julia is guaranteed on to call the finalizer before clean exit, or before giving the memory to some other object. Even in ref-counted languages you are afaik not guaranteed immediate scrubbing upon unreachability (I think cycle detection is done periodically, at least in python).

In other words: This will protect you from leaking the secrets from bugs that leak uninitialized memory, and it will protect you from leaving the secret behind after exit. It will not protect you from buffer overflow based info leaks. The secret will not be scrubbed if you segfault / crash julia.

Also, it gives you the nice way of scrubbing your secrets by calling finalize explicitly, once you know that you don’t need the array anymore (if you don’t want to wait for the gc).

The main problem I see is the resize / realloc leak. A dirty work-around would be to just set the shared flag on the array, such that the runtime throws in array.c on attempted resize (protecting you from accidentally resizing: A crash is better than a security incident).

Not sure whether that makes trouble; I think we would lie to the compiler here. julia / the type-tag would think that the array is non-shared, but array.c would think that it is shared via the flags in the jl_array struct-- but maybe I misunderstood how shared arrays are working internally.

If you think a “self-scrubbing” resizeable array is important, then maybe something can be done via modifying array.c (scrub after realloc, do the size computation correctly with offset and capacity for the scrubbing; should be reasonably cheap since moving data after realloc is expensive anyway; but you would need to build your custom julia binaries that scrub all arrays on realloc).

Generally, I would like to see more use of calloc instead of malloc and bzero / memset in array.c. That would also go into the first slot of the constructor! Just as arena if you have strong opinions on where the array should be placed (feature currently does not exist, but is trivial to badly implement via unsafe_wrap after ccall to your favorite allocator; the only thing that I don’t see how to nicely get from julia without modifying array.c is the crazy fast pool-allocation for small arrays, and I think finalizers are more expensive to manage for the gc than the free; others would know more about this).

edit: One great way of leaking uninitialized memory that contains secrets from the previous owner – even if you never use the uninitialized constructor – is via structure padding. The self-scrubbing non-resizeable array will protect you from this, but you can still leak register contents into structure padding.