Hello,
I had a question about memory layout of structs that contain mutable fields. e.g.
struct Test{Float64}
id :: Int
data::Vector{Float64}
end
testarr = [Test(1, [2.1,]), Test(2, [3.1,])]
pointer(testarr, 2) - pointer(testarr, 1) == 8
Now I know that these cannot be fully stored contiguously in memory due to the variable size of the Vector field, but how about the pointers to the data?
It seems (AFAICT) that if I create a vector of these, it will be a vectors of pointers pointing to them somewhere else ( tested it with pointer(testarr, 2) - pointer(testarr, 1) == 8 #!= 16 )
The question I have is, why are these not stored in a fashion such as: [int, ptr, int, ptr, int, ptr,...]etc ?
At this point it’s mostly a heuristic. Storing inline can sometimes make things slower.
Of course there are cases where storing inline is impossible for immutable (self referring) though it is possible to decide on that at declare time. It is decided, however, not to change it since it’s not clear if it’s a performance win.
Can you elaborate (or point me to some reference discussion) on when it’s slower? To me it seems always a loss to have to go through an extra layer of pointers, to random locations on the heap to find for example the id of the object at a certain index in the array, rather than accessing it directly from the index + field offset.
I’m mostly interested in iterating through arrays of these.
When do I copy data when I’m iterating through the array? That the mutable fields are still defined on the heap makes complete sense to me, but why is it faster to have the structs themselves (which are in this case 16 bytes long) potentially scattered through the heap and accessed via pointers?
Isn’t this related to the fact that non-isbits are heap-allocated? If the object is already on the heap, then when storing it in an array, it’s simpler and faster to just store the pointer. On top of that, storing it inline would imply that every array access would cause an allocation, as Julia would have to create a fresh Test{Float64} object on the heap.
In other words, this is not intentional; it’s a long-standing performance issue.
Surely it depends on the application. If you’re accessing the id field of Test objects stored in an array a lot, then getting rid of a layer of indirection is good. For copying arrays it’s not, as Yichao mentioned. There just isn’t a clear best thing to do in all cases.
Well, if the object were to be stored inline, then whenever you access it you do need to copy it. Of course in some cases that can be optimized out and if you are just dealing with an object with 2 field copying isn’t going to matter too much but that’s not always the case, i.e.
Well, sort of, but both the statement and the logic are wrong.
For a start, non-isbits are NOT required to be heap-allocated. They are only required to be heap allocated at interfaces. In general, such interfaces includes fields, arrays and function call arguments and return values. Note that is exactly what this thread is about i.e. the correct version of that statement is related as in it is exactly describing this phenomena but it’s not the reason.
Therefore, no
is wrong. Storing it inline will not require any more allocation for structs that has references as fields.
What IS true, OTOH, is that there must be a single convention as for if a type is passed by copying or reference in fields and as argument (including array and return value). And THIS is exactly because otherwise it will be required to make allocation.
Bottom line is, such change in the language will not require more heap allocation. It will only require more copying and that’s the performance trade-off.
It is completely intentional. It’s a long-standing performance trade-off.
I don’t get it. Are you saying that Want non-allocating array views · Issue #14955 · JuliaLang/julia · GitHub hasn’t been fixed because it’s not clear that it’s a good trade-off? I thought the reason it hasn’t been done was that it was a complex change to the compiler, that few people had the skill to do.
I mean, the trade-off is exactly the same for isbits structs, no? If it’s intentional, why are (small) isbits objects stack-allocated, and inline in arrays?
For the API/ABI changing part/version, yes. For the non-API/ABI changing changing, no. There has already been a lot of changes to make this happen. The non-API/ABI changing part is not easy for sure, but it’s completely independent of what’s being discussed here.
Sort of. Yes there is also a trade-off for isbits structs.
Because there are certainly more unambiguious cases where isbits structs has to be inlined to get full optimization (vectorization for example). Such optimizations are generally not a thing for structs with references in them. Of course we started this way due to limitations in the implementation. Such limitations has mostly be lifted to fix most of the performance issues so at this point it does not really need to be this way. But like I said at the very beginning,
I think there is some misunderstanding to what I meant. I don’t mean to allocate this [Int, Ptr, Int, Ptr,...] array on the stack ( if that is what is meant by inline storing , I’m not a software engineer), rather contiguous on the heap. Right now it’s stored (on the heap? I was testing this in a repl) as a vector [Ptr, Ptr, Ptr,....] each pointing to a random allocated block of memory that looks like (Int, Ptr) somewhere else on the heap. And then that last Ptr points to the vector of data. What my question was about is why is this first layer of pointers necessary.
# Basic julia implementation
struct Test1
id::Int
data::Vector{Float64}
end
Base.rand(::Type{Test1}, n::Int, datlength::Int) = [Test1(rand(Int), rand(datlength)) for i=1:n]
function iterate_and_store(testvec::Vector{Test1}, fillvec::Vector{Float64}, id::Int)
for i = 1:length(testvec)
fillvec[i] = testvec[i].data[id]
end
end
# Ptr implementation
const vecstorage = Vector{Float64}[]
struct Test2
id::Int
data::Ptr{Float64}
end
function Base.rand(::Type{Test2}, n::Int, datlength::Int)
outarr = Test2[]
for i = 1:n
tvec = rand(datlength)
push!(vecstorage, tvec)
push!(outarr, Test2(rand(Int), pointer(tvec)))
end
return outarr
end
function iterate_and_store(testvec::Vector{Test2}, fillvec::Vector{Float64}, id::Int)
for i = 1:length(testvec)
fillvec[i] = unsafe_load(testvec[i].data, id)
end
end
using BenchmarkTools
empty!(vecstorage)
const testlength = 10000 # total amount of Test structs
const datlength = 20 #length of datavector in each Test struct
const id = 15 #Index of datavector to access during iterate store
const fillvec = zeros(Float64, testlength)
const tvec1 = rand(Test1, testlength, datlength)
const tvec2 = rand(Test2, testlength, datlength)
@btime iterate_and_store(tvec1, fillvec, id) # 27.263 ÎĽs
@btime iterate_and_store(tvec2, fillvec, id) # 17.403 ÎĽs
I just realized that a pointer and reference are not really the same since one is not checked for validity, whereas the other does get checked. So the comparison above is a little unfair. However, the question then becomes why it’s not stored as [Int, Ref, Int, Ref, Int, Ref] , and whether it would be slower if it was.
AFAIK, in the context of arrays, “inline” refers to contiguous storage. We talked about stack allocation because, as of Julia 1.1.0, when you create an object that is isbits (contains no pointers) and immutable, it’s created on the stack. Of course, the Array’s storage is on the heap, so when you push the object to a vector, the object is copied from the stack to the heap. isbits objects are stored contiguously.
There isn’t any misunderstanding and that’s exactly what I’m talking about. In fact, stack allocation is almost completely irrelevant to this discussion. (It’s mostly a local optimization)
And as I said, there’s no fundamental reason what you want couldn’t be done at this point. However, doing that is a major breaking change with performance regression in some contexts so we would like to not do this for performance reason right now, before implementing all other optimizations first, and we have not done that part yet.
I’ll repeat that the performance regression is of course not for ALL use cases, and for a fully inlined use of a field of the object it’s likely not so you shouldn’t try to understand why doing what you want is bad for your case or why it is necessary. It’s basically not but it’s left unchanged as a trade-off.
Okay I see, thanks for the clarification. I think together with
I’m starting to understand what is going on internally, and why it’s like this. I will search for a way I can make it so that all the mutable/non-isbits type objects that are created are filled contiguous on the heap. Which would make it faster in my usecase, or at least let’s me test whether it’s the case.