Memory layout of structs with mutable fields

Well, if the object were to be stored inline, then whenever you access it you do need to copy it. Of course in some cases that can be optimized out and if you are just dealing with an object with 2 field copying isn’t going to matter too much but that’s not always the case, i.e.

Well, sort of, but both the statement and the logic are wrong.

For a start, non-isbits are NOT required to be heap-allocated. They are only required to be heap allocated at interfaces. In general, such interfaces includes fields, arrays and function call arguments and return values. Note that is exactly what this thread is about i.e. the correct version of that statement is related as in it is exactly describing this phenomena but it’s not the reason.

Therefore, no

is wrong. Storing it inline will not require any more allocation for structs that has references as fields.

What IS true, OTOH, is that there must be a single convention as for if a type is passed by copying or reference in fields and as argument (including array and return value). And THIS is exactly because otherwise it will be required to make allocation.

Bottom line is, such change in the language will not require more heap allocation. It will only require more copying and that’s the performance trade-off.

It is completely intentional. It’s a long-standing performance trade-off.

1 Like

Not (just?) for copying arrays, but for any non-inlined use of the stored objects.

1 Like

I don’t get it. Are you saying that https://github.com/JuliaLang/julia/issues/14955 hasn’t been fixed because it’s not clear that it’s a good trade-off? I thought the reason it hasn’t been done was that it was a complex change to the compiler, that few people had the skill to do.

I mean, the trade-off is exactly the same for isbits structs, no? If it’s intentional, why are (small) isbits objects stack-allocated, and inline in arrays?

For the API/ABI changing part/version, yes. For the non-API/ABI changing changing, no. There has already been a lot of changes to make this happen. The non-API/ABI changing part is not easy for sure, but it’s completely independent of what’s being discussed here.

Sort of. Yes there is also a trade-off for isbits structs.

Because there are certainly more unambiguious cases where isbits structs has to be inlined to get full optimization (vectorization for example). Such optimizations are generally not a thing for structs with references in them. Of course we started this way due to limitations in the implementation. Such limitations has mostly be lifted to fix most of the performance issues so at this point it does not really need to be this way. But like I said at the very beginning,

1 Like

I think there is some misunderstanding to what I meant. I don’t mean to allocate this [Int, Ptr, Int, Ptr,...] array on the stack ( if that is what is meant by inline storing , I’m not a software engineer), rather contiguous on the heap. Right now it’s stored (on the heap? I was testing this in a repl) as a vector [Ptr, Ptr, Ptr,....] each pointing to a random allocated block of memory that looks like (Int, Ptr) somewhere else on the heap. And then that last Ptr points to the vector of data. What my question was about is why is this first layer of pointers necessary.

That’s because of mutability of each individual array element - you should be able to change it by its pointer. You can look here for mutability and contigious arrays: Make immutable mutable again - or make arrays of structures stack-allocated

Test is not mutable as far as I know, do you mean the data arrays ?

I made a mwe showcasing what I mean:

# Basic julia implementation
struct Test1
	id::Int
	data::Vector{Float64}
end
Base.rand(::Type{Test1}, n::Int, datlength::Int) = [Test1(rand(Int), rand(datlength)) for i=1:n]

function iterate_and_store(testvec::Vector{Test1}, fillvec::Vector{Float64}, id::Int)
	for i = 1:length(testvec)
		fillvec[i] = testvec[i].data[id]
	end
end

# Ptr implementation
const vecstorage = Vector{Float64}[]

struct Test2
	id::Int
	data::Ptr{Float64}
end
function Base.rand(::Type{Test2}, n::Int, datlength::Int)
	outarr = Test2[]
	for i = 1:n
		tvec = rand(datlength)
		push!(vecstorage, tvec)
		push!(outarr, Test2(rand(Int), pointer(tvec)))
	end
	return outarr
end

function iterate_and_store(testvec::Vector{Test2}, fillvec::Vector{Float64}, id::Int)
	for i = 1:length(testvec)
		fillvec[i] = unsafe_load(testvec[i].data, id)
	end
end

using BenchmarkTools
empty!(vecstorage)
const testlength = 10000 # total amount of Test structs
const datlength  = 20 #length of datavector in each Test struct
const id         = 15 #Index of datavector to access during iterate store
const fillvec = zeros(Float64, testlength)
const tvec1    = rand(Test1, testlength, datlength)
const tvec2    = rand(Test2, testlength, datlength)
@btime iterate_and_store(tvec1, fillvec, id) # 27.263 μs
@btime iterate_and_store(tvec2, fillvec, id) # 17.403 μs

I just realized that a pointer and reference are not really the same since one is not checked for validity, whereas the other does get checked. So the comparison above is a little unfair. However, the question then becomes why it’s not stored as [Int, Ref, Int, Ref, Int, Ref] , and whether it would be slower if it was.

AFAIK, in the context of arrays, “inline” refers to contiguous storage. We talked about stack allocation because, as of Julia 1.1.0, when you create an object that is isbits (contains no pointers) and immutable, it’s created on the stack. Of course, the Array’s storage is on the heap, so when you push the object to a vector, the object is copied from the stack to the heap. isbits objects are stored contiguously.

There isn’t any misunderstanding and that’s exactly what I’m talking about. In fact, stack allocation is almost completely irrelevant to this discussion. (It’s mostly a local optimization)

And as I said, there’s no fundamental reason what you want couldn’t be done at this point. However, doing that is a major breaking change with performance regression in some contexts so we would like to not do this for performance reason right now, before implementing all other optimizations first, and we have not done that part yet.

I’ll repeat that the performance regression is of course not for ALL use cases, and for a fully inlined use of a field of the object it’s likely not so you shouldn’t try to understand why doing what you want is bad for your case or why it is necessary. It’s basically not but it’s left unchanged as a trade-off.

Okay I see, thanks for the clarification. I think together with

I’m starting to understand what is going on internally, and why it’s like this. I will search for a way I can make it so that all the mutable/non-isbits type objects that are created are filled contiguous on the heap. Which would make it faster in my usecase, or at least let’s me test whether it’s the case.

Note that the statement you quote isn’t correct.

You cannot do it with an array.

I was thinking of asking for a valid chunk of heap memory of the size for the amount of elements I want to fill in, and then copy paste the previously created ones into it.

As far as I understood, this really is foremost about implementation issues, not about tradeoffs.

One of the most relevant cases is

struct some_wrapper{parentT, extra1, extra2, extra3}
parent::parentT
end

where parentT is non-bitstype, e.g. Vector{Int}.

Such objects should always be stored inline, both in arrays and in other struct or mutable struct, and only ever get heap-allocated in non-inferred code. There is no trade-off: We simply pay an extra object allocation plus pointer indirection for no gain.

I would be super happy if struct with a single non-singleton field could be special cased to simply be the pointer to the parent; then, all such abstractions would be guaranteed to be always zero-cost in type-stable settings (and afaict this would not even need changes to garbage collection).

1 Like

You can’t store references that way

Because? I actually have trouble finding what references exactly are in Julia, I thought they were kind of like gc registered pointers, can’t I move them around?

Well, as I said above, there’s certainly cases where the performance issue is not a problem but that doesn’t make it not a tradeoff.

First note that that doesn’t fix the OP’s question.
And this definitely come up in related discussion (I remember talking about it in at least 2 occasions).

It’ll be a very strange API. What stops you from stopping at one field for example. It’s almost certainly still a win for 2 fields. Additionally, Tuple of one element and of two element are going to have completely different layout.

Yes they are managed by the GC so you must not move them around without letting the GC know about it.

I guess I’m not meant to interact with the gc in any way?