Understanding how an array of structures works

I tripped across something that I just plain don’t understand when using an array to contain a structure. If I have a structure created outside a for loop and try to assign it to an element of the array, I do not get the structure values in the array. Yet, if I create it in the for loop, it is assigned correctly.

My thought was to create a structure with an initial value, then assign it to all elements of the array–this did not work–the array elements where assigned some value which I have not idea where they came from.

I am wanting to understand how julia handles arrays of structures. I have pasted some test code below. Examples 1 and 3 work in that they give me the expected results; Example 2 gives me funky results.

I am running julia 1.4 on Linux Mate

Thanks.

----Code follows----

module TestArray

mutable struct Point # Attributes of each character on screen
X::Int64
Y::Int64
end

function PrintArray(pa::Array{Point})
global Point
for i in 1:length¶
x = pa[i].X
y = pa[i].Y
println(“i = $i, X = $x, Y = $y”)
end
end

Test case 1 – define array as a collection of structures

initialized as 1 – 10, 2 – 20, 3 – 30 …

println(“Structure as array collection”)
PointArray1 = [Point(i, i * 10) for i in 1:5]
PrintArray(PointArray1)

Test case 2 – define array as 5 undefined elements,

then initialize each element from a structure containing

2 – 20, 4 – 40, 6 – 60 …

N.B.: structure is defined OUTSIDE loop

println(“\nStructure outside loop”)
PointArray2 = Array{Point,1}(undef, 5)
fta = Point(0, 0)
println("fta = ", fta)

for i in 1:length(PointArray2)
fta.X = i * 2
fta.Y = i * 20
PointArray2[i] = fta
end

PrintArray(PointArray2)

Test case 3 – Similar to case 2, but structure is defined INSIDE loop with

3 – 30, 6 – 60, 9 – 90 …

println(“\nStructure inside loop”)
PointArray3 = Array{Point,1}(undef, 5)
for i in 1:length(PointArray3)
fta = Point(i * 3, i * 30)
PointArray3[i] = fta
end

PrintArray(PointArray3)
end

-------------Results----------------------

Structure as collection — expected results correct
i = 1, X = 1, Y = 10
i = 2, X = 2, Y = 20
i = 3, X = 3, Y = 30
i = 4, X = 4, Y = 40
i = 5, X = 5, Y = 50

Structure outside loop – Results do not match expectation
fta = Main.TestArray.Point(0, 0)
i = 1, X = 10, Y = 100
i = 2, X = 10, Y = 100
i = 3, X = 10, Y = 100
i = 4, X = 10, Y = 100
i = 5, X = 10, Y = 100

Structure inside loop – Expected results correct
i = 1, X = 3, Y = 30
i = 2, X = 6, Y = 60
i = 3, X = 9, Y = 90
i = 4, X = 12, Y = 120
i = 5, X = 15, Y = 150

2 Likes

Example 2 modifies the same Point instance in each iteration and assigns (a reference to) it to each element of the array. Example 3 a new Point is created in each iteration.

You likely expect that assignment is a copy, but it is not in Julia.

1 Like

Ahhh, that explains it (I think). So, is a julia array is actually an array of pointers then? If so, I was thinking that an array would be more like sequential memory locations, such as Fortran or C.

Thanks

Not just an array. You can think of all Julia variables as pointers (or references as in C++ or Java).

Got it. Thank you.

No, they CAN be dense arrays of data, with no pointers (will be if you use default structs, non-mutable). Still, the array isn’t immutable, as that wouldn’t be very helpful.

EDIT: I guess all your data in the struct also needs to be of “bitstype”, or of immutable such structs. I might be wrong about the details, at least Julia is clever with eliminating pointers (and heap allocations), but you can get pointers in some case.

See also: Arrays · The Julia Language

1 Like

I found this in the manual:

In order to support mutation, such objects are generally allocated on the heap, and have stable memory addresses. A mutable object is like a little container that might hold different values over time, and so can only be reliably identified with its address. In contrast, an instance of an immutable type is associated with specific field values –- the field values alone tell you everything about the object. In deciding whether to make a type mutable, ask whether two instances with the same field values would be considered identical, or if they might need to change independently over time. If they would be considered identical, the type should probably be immutable.

To recap, two essential properties define immutability in Julia:

  • It is not permitted to modify the value of an immutable type.
    • For bits types this means that the bit pattern of a value once set will never change and that value is the identity of a bits type.
    • For composite types, this means that the identity of the values of its fields will never change. When the fields are bits types, that means their bits will never change, for fields whose values are mutable types like arrays, that means the fields will always refer to the same mutable value even though that mutable value’s content may itself be modified.
  • An object with an immutable type may be copied freely by the compiler since its immutability makes it impossible to programmatically distinguish between the original object and a copy.
    • In particular, this means that small enough immutable values like integers and floats are typically passed to functions in registers (or stack allocated).
    • Mutable values, on the other hand are heap-allocated and passed to functions as pointers to heap-allocated values except in cases where the compiler is sure that there’s no way to tell that this is not what is happening.

Despite this being an immutable struct:

julia> struct Point # Attributes of each character on screen
              X::Int64
              Y::Int64
       end

julia> one_point = Point(1, 2)
Point(1, 2)

julia> another_point = Point(2,3)
Point(2, 3)

julia> array_of_points = [one_point, another_point]  # 1D array is a Vector, 
2-element Vector{Point}:
 Point(1, 2)
 Point(2, 3)

julia> one_point = another_point  # you can still "assign" to
   # one_point... (only in global scope?), but it doesn't change the array
Point(2, 3)

julia> array_of_points
2-element Vector{Point}:
 Point(1, 2)
 Point(2, 3)

julia> array_of_points[1] = another_point  # and change a point in the array
Point(2, 3)

I guess you mean doesn't? And a plain assignment in Julia is always a noop, it will never change anything. Your mixing up the variable one_point with the object it binds to. I suggest reading Values vs. Bindings: The Map is Not the Territory | juliabloggers.com.

4 Likes

So, I’ve been messing around trying to create a DenseArray using variations of:

PointArray4 = DenseArray{Point,1}(undef,5)

and I get:

ERROR: LoadError: MethodError: no constructors have been defined for DenseArray{Main.TestArray.Point,1}

I know I’m not understanding something, but what is it?

I want to have a structure that contains related data items, then make an array of these structures that could be passed to a C function such that the C module would see a “structure of structures”.

Thanks

DenseArray is an abstract type, you cannot create instances of it. It looks like you should just create an Array.

2 Likes

So, how do you create a dense array?

julia> supertype(Array)
DenseArray{T,N} where N where T

An Array is a subtype of DenseArray and can be used anywhere the methods require a DenseArray. Create an Array.

PointArray4 = Array{Point,1}(undef,5)

or yet

PointArray4 = Vectort{Point}(undef,5)

As Vector{T} is just an alias for Array{T, 1}.

2 Likes

So, now I’m more confused. If Array creates a DenseArray, why doesn’t my Test Case 2 create a dense array of empty structures instead of an array of pointers?

My question: When does julia do what with arrays?

What I am looking for is the equivalent of an array of structures sequentially allocated in memory.

Thanks

I think when you create a Vector{T} it contiguously allocates memory where each element has enough space for sizeof(T).

See:

julia> struct Foo
           a::Int
           b::Int
       end

julia> x = Vector{Foo}(undef, 5);

julia> for i in 1:length(x)
           x[i] = Foo(i, i+1)
       end

julia> x
5-element Array{Foo,1}:
 Foo(1, 2)
 Foo(2, 3)
 Foo(3, 4)
 Foo(4, 5)
 Foo(5, 6)

julia> px = pointer(x)
Ptr{Foo} @0x00007f6d154cdf90

julia> Ref(px, 1)
Ptr{Foo} @0x00007f6d154cdf90

julia> Ref(px, 2) === px + sizeof(Foo)
true

EDIT:
The properties of the type can still sit somewhere else

julia> struct Bar
           a
           b
       end

julia> y = Vector{Bar}(undef, 5);

julia> for i in 1:length(y)
           y[i] = Bar("Iteration: $i", i*rand())
       end

julia> y
5-element Array{Bar,1}:
 Bar("Iteration: 1", 0.1973962819315649)
 Bar("Iteration: 2", 1.9484694321258949)
 Bar("Iteration: 3", 0.45878908786275296)
 Bar("Iteration: 4", 1.801662792172583)
 Bar("Iteration: 5", 2.1954558015800663)

julia> pointer(y)
Ptr{Bar} @0x00007f6d15155090

julia> pointer(y[1].a)
Ptr{UInt8} @0x00007f6d1501ebc8

You are mixing concepts. DenseArray has nothing to do with the object being stored as a reference/pointer or the value itself. DenseArray stores values (as reference or value itself) contiguously, always. It contrasts with an sparse array, that is an array that does not store all positions from the first filled position to the last filled position contiguously, but instead allow the existence of “holes” (that may be considered being some kinda of zero value, or really as non-existent).

If Julia will use references/pointers or will store the value itself is a implementation detail and you cannot really force it (I may be wrong, people more experienced than me can correct me). Often, however, if you use immutable structs, you will get the values contiguously allocated (instead of pointers to the values). If you use a mutable struct, then Julia have to be able to update some object in all places that it appears when you change it in anyplace, so often Julia will be always using references/pointers for such objects to allow this behavior.

Your Point struct is, in fact, the common example of a struct that probably should be an immutable struct. It is small enough that is often more efficient to rebuild it than pay the overhead of it being a pointer to a heap allocated object. Alternatively, you can also use two vectors, x and y, of Int64 instead of this structure. All depends on your use case.

1 Like

If I am not mistaken (if I am, please correct me).

When you create a vector, or array, you are creating what it seems you expect as a “dense array”, as both these structures will reserve contiguous spaces in memory for that structure, if that is possible given the definition of the types. Thus,

struct MyType
   a :: Float64
   b :: Float64
end
x = Vector{MyType}(undef,2)

will reserve the space in memory necessary to store a vector of 4 Float64 numbers. Note this example, in particular:

julia> struct MyType1 # without specifying types
          a
          b
       end

julia> x = Vector{MyType}(undef,2)
2-element Array{MyType,1}:
 #undef
 #undef

julia> struct MyType2 # specifying types
          a :: Float64
          b :: Float64
       end

julia> y = Vector{MyType2}(undef,2)
2-element Array{MyType2,1}:
 MyType2(6.9028267664754e-310, 6.9028267664169e-310)
 MyType2(6.902826766477e-310, 0.0)

The vector y is mutable, because vectors are always mutable, so you can reassign it elements to new values of the correct type:

julia> y[1] = MyType2(1.,2.)
MyType2(1.0, 2.0)

julia> y
2-element Array{MyType2,1}:
 MyType2(1.0, 2.0)
 MyType2(6.902826766477e-310, 0.0)

Furthermore, when dealing with arrays, the = sign means only a name assignment, so if you do:

julia> z = y
2-element Array{MyType2,1}:
 MyType2(1.0, 2.0)
 MyType2(6.902826766477e-310, 0.0)

z is the same vector as y, just with a new name bound to it. Nothing really happened. That’s what happened when you did this:


julia> one_point = another_point  # you can still "assign" to
   # one_point... (only in global scope?), but it doesn't change the array
Point(2, 3)

You just assigned the name “one_point” to what was “another_point”. The new one_point name is bound to the another_point value, and has nothing to do anymore with the previous one_point. If another_point is mutable, that will behave exactly as for a vector, and modifying the value in the new one_point will modify the value of the another_point:

julia> mutable struct A
          x
       end

julia> a1 = A(1)
A(1)

julia> a2 = a1
A(1)

julia> a2.x = 2
2

julia> a1
A(2)

If, on the other side, the struct was immutable, you cannot change its values, therefore both names are bound to identical and fixed values:

julia> struct B
         x
       end

julia> b1 = B(1)
B(1)

julia> b2 = b1
B(1)

julia> b2.x = 2
ERROR: setfield! immutable struct of type B cannot be changed
Stacktrace:
 [1] setproperty!(::B, ::Symbol, ::Int64) at ./Base.jl:34
 [2] top-level scope at REPL[41]:1

Therefore, there is no ambiguity here on that b1 or b2 contents are.

2 Likes

Since it is required for C-introp you can rely on it. The docs say:

When used recursively, isbits types are stored inline. All other types are stored as a pointer to the data. When mirroring a struct used by-value inside another struct in C, it is imperative that you do not attempt to manually copy the fields over, as this will not preserve the correct field alignment. Instead, declare an isbits struct type and use that instead. Unnamed structs are not possible in the translation to Julia.

7 Likes

Good to know.

I have one doubt concerning the continuity in memory of some of these arrays. For example, if I create an array of mutable types, it is clear that the array cannot be stored continuously in memory in general, since the array values can be simply pointers to previously defined variables:

julia> mutable struct M
          x :: Int
       end

julia> y = Vector{M}(undef,2)
2-element Array{M,1}:
 #undef
 #undef

julia> m = M(3)
M(3)

julia> y[1] = m
M(3)

julia> y[1].x = 1
1

julia> m
M(1)

My doubt is when the same thing is done with immutable structs:

julia> struct I
         x :: Int
       end

julia> y = Vector{I}(undef,2)
2-element Array{I,1}:
 I(139917445965040)
 I(139917445997984)

julia> i = I(3)
I(3)

julia> y[1] = i
I(3)

Since I cannot modify the value of i.x, because it is immutable, I am not sure if y[1] is a pointer to i or if is a copy of i at this moment. I would guess that it is a copy and that, in this case, the memory was allocated when y was created at first, and y is contiguous in memory. Is that right?

ps. I tried to figure that out using pointer_from_objref, but this cannot be used on immutable objects. Is there a deep reason for one not be able to get a pointer to an immutable object?

1 Like