Serialization of a SubArray forgets position in parent array

iagoleal · November 28, 2022, 2:50pm

Hello!
I am working in a codebase where we use Vectors divided in multiple “viewing windows” (SubArrays) representing some data.
Recently I had to serialize those views and noticed that the information about the parent array and index offsets was lost in the process.

As an example:

import Serialization

original = [10.0, 20.0, 30.0, 40.0, 50.0, 60.0]
window = @view original[3:5]

Serialization.serialize("example", window)
serialized = Serialization.deserialize("example")

If we compare serialized with window, both the parent and offsets are wrong.

parent(window)    #  [10.0, 20.0, 30.0, 40.0, 50.0, 60.0]
parent(serialized) # [30.0, 40.0, 50.0]

window.indices   # (3:5, )
serialized.indices # (1:3, )

This is the behavior even when serializing both the original vector and subarray together.

Serialization.serialize("example2", (original, window))
(w, y) = Serialization.deserialize("example2")

There is no relation between w and parent(y) and there seems to be no way (besides searching and matching on it) to find on y from which indices of w it is supposed to refer to, an information that I needed.

So, my question is if there is some rationale behind this behavior or I just stumbled into a bug?
And does anybody knows if in the current Julia version (LTS or Release) there is a way to serialize Vectors together SubArrays with losing their linking? Or I will have to roll my own serializer?

DNF · November 29, 2022, 5:58pm

This is the behavior I would have expected. Serialize only the values inside the window. Serializing both the parent and the view seems wasteful. The view only contains the ‘visible’ values.

Hoewever, the fact that the parent is accessible via the parent function makes me a bit less confident in my conclusion.

mbauman · November 29, 2022, 7:01pm

Yes, this is intentional. There are lots of smarts in SubArray that recompute both parent and indices in many places — they are not promised to be held as you initially constructed them. For example, we might reshape the parent upon construction or make an unaliased copy of an index. In this case it can be a huge savings in disk/network to “trim” the parent and recompute the indices if possible.

That said, the decision to do this trimming when serializing so was made very long ago and predates a lot of serializer smarts. I wonder if it’d now be possible to see if the parent was already serialized and use a reference to it if that’s the case. I don’t know if such a thing would be possible without having some crazy order-of-serialization dependency to it.

iagoleal · December 1, 2022, 1:07pm

Thanks for the explanations! Apparently I will have to think about another way to do what I needed.

Something that still bugs me out is that deepcopy works exactly as I would expect: it makes a copy of the parent when copying only the SubArray and preserves the relationship when copying a struct/tuple containing both the parent and the SubArray.
Maybe this behavior was implemented after the one in Serialization… I don’t know.

Topic		Replies	Views
Loading views from serialized file disconnects memory link General Usage question , views , serialization	0	268	July 31, 2023
Undoing a view General Usage	5	416	September 19, 2023
Custom serializer for arrays of Union types Performance	2	37	May 15, 2025
Should view(array, 1:length(array)) return a SubArray? Data array	11	1147	August 19, 2017
Memory Issues: Serializing and deserializing data General Usage	2	482	August 6, 2018

Serialization of a SubArray forgets position in parent array

Related topics