v = Vector{Int8}(1:10)
w = @view v[3:8]
deleteat!(v, 3:8)
println(v)
w[4] = 42
println(w)
The result is this:
Int8[1, 2, 9, 10]
Int8[9, 10, 0, 42, 7, 8]
My question is about the second printed line, which is the content of the view w after a deletion in v and an assignation in w.
Are the values actually found in the view w predictable in some way, or are they garbage ?
Where is the 42 actually stored ?
Is there a risk of memory corruption with this kind of action ?
The numbers are reproducible due to the implementation details of deleteat!. The original memory is not reallocated so it still exists. deleteat! just shifts values from the end down and fills the next spot with a zero.
The main danger would be a segmentation fault. That is accessing memory that Julia did not allocate.
It’s probably a bug. The bounds check should occur on the original array. Instead it seems to occur on the SubArray’s original indices.
is it a bug? doesn’t the docstring of view specifically say that the parent array can’t be modified and explains the bounds checking behavior?
The behavior is undefined if the shape of the parent array is changed after view
is called because there is no bound check for the parent array; e.g., it may
cause a segmentation fault.
I would expect that the deleteat! call should throw since the indices are out of bounds. Unless the contract is actually that you can delete already out of bounds indices, which is possible (I may recall wrongly) but would surprise me a bit.
Ultimately, I think this means that views shouldn’t assume to be inbounds after creation, since the parent vector could have changed size. Its boundscheck can’t, in general, be removed anymore (unless the parent is a Memory, not a Vector - I guess the same holds for Matrix).
We could fix this by making the view wrap the Memory of the parent instead, in which case removing the bounds check on indexing stays legal, because those can’t shrink in size.
Quite unfortunate, since that will cause a lot of code where the parent needs to be a Vector to become slower.
if the parent vector changes size, all bets are off for any sane behavior of existing views. I think that’s just something users need to be aware of and is not a mistake with the current design
How should someone that is aware of this change their code to prevent that issue though? Say you write some library that takes in a Vector and you take views of that vector in your library. Unbeknownst to you, the caller has another task running where they change the vector length. Under the assumption that the vector doesn’t change size, your library is correct & safe, but if the size can change, your previously safe code is now incorrect & unsafe because of code you had no control over. That’s a recipe for not only segfaults, but security exploits. I’m having a hard time coming up with code that prevents this, short of not using views in your library.
If we change views to instead apply to the Memory the vector originally referenced, this will still be safe and not segfault because the memory can’t suddenly change size out from under you.
“your” (user) code is incorrect. I don’t think package devs can be expected to save users from themselves if they are trying to do something as obviously dangerous as asynchronous modification of backing memory to array views
function getindex(V::FastContiguousSubArray{<:Any, 1}, i::Int)
@inline
@boundscheck checkbounds(V, i)
@inbounds r = V.parent[V.offset1 + i]
r
end
Here is the getindex in question at least for a single index. The @inbounds is problematic. Perhaps we could fix this by specializing checkbounds for SubVector?
function checkbounds(A::SubArray{<:Any,1}, i::Int)
@inline
checkbounds(Bool, A, i) || throw_boundserror(A, I)
checkbounds(Bool, A.parent, A.offset1 + i) ||
throw_boundserror(A.parent, A.offset1 + i)
nothing
end
You can still elide the bounds check through the normal mechanisms.
There doesn’t have to be a “user program” involved here. For example, consider a web server that has a concurrency bug on some vector where an operation uses those views that’s triggerable through requesting some end points in a particular way. If this were to throw an error instead of segfault, you’d get a nice, isolated aborted request and could otherwise carry on. With a segfault however, you either crash the whole server or worse, actively give an attack vector to a malicious user of your web service, all triggered by a concurrency bug that led to being a safety issue.
I think it’d be good to prevent these things by design, by ensuring that views are as safe as they can reasonably be.
Thanks everybody for your analysis.
My understanding so far, based on answers, is that the use of views on an array is unsafe and may lead to memory corruption if the array is resized.
That is good to know in my learning trip …