Minimizing Work in Copying/Manipulating Custom Data Structures

I have a computation I want to do where I am using my own data structure (composed of two or more arrays, which, themselves, may be specialized data structures). I want to perform an operation on this data structure in a way that will minimize the amount of total computation/allocation by the code. My attempt was to introduce a pre-allocated work space data structure, store the result in the work space, and then swap things so the the work space data structure becomes my main structure. My attempt was as follows:

# toy data structure
struct DataStruct{TF<:AbstractFloat, TI<:Integer}
   X::Vector{TF}
   Y::Vector{TI}
end

# toy transformation of the sort that would require some
# sort of intermediate data structure.
function struct_func!(S::DataStruct, S_work::DataStruct)
   n = length(S.X);

   for (j,k) in zip(1:n,n:-1:1)
      S_work.X[j] = sin(S.X[j] * (S.Y[k]));
      S_work.Y[j] = S.Y[j] - Int(j);
   end

# my attempt to change the references to the two structures
   S, S_work = S_work, S;

   S, S_work
end


S = DataStruct{Float64, Int}(range(0,stop=1, length=5), 2*ones(5));
println(S)
S_work = DataStruct{Float64, Int}(zeros(5), zeros(5));

struct_func!(S,S_work);

println(S);
println(S_work);

This returns:

DataStruct{Float64,Int64}([0.0, 0.25, 0.5, 0.75, 1.0], [2, 2, 2, 2, 2])
DataStruct{Float64,Int64}([0.0, 0.25, 0.5, 0.75, 1.0], [2, 2, 2, 2, 2])
DataStruct{Float64,Int64}([0.0, 0.479426, 0.841471, 0.997495, 0.909297], [1, 0, -1, -2, -3])

indicating that

S, S_work = S_work, S;

does not do what I had hoped it would do.

My thinking had been that this would be like C/C++, where I could just shuffle the pointers, but that’s not working as expected. I would like to doing a second full loop to copy the data over, or allocating another data structure, unless there’s no other straightforward way to do this.

Julia is like C, and nothing like C++.

S, S_work = S_work, S;

is exactly the same as in C: It swaps the local bindings around, which is either a no-op or swaps two values on the stack / in registers. Since there is no * or [] or -> involved, no pointer gets written to and the swap cannot be visible to the caller.

If you want a swap to be visible to the caller, then you need to either make S and S_work mutable and swap out the contents, or you need to pass Ref(S). Or, in your case, you can do S,S_work = struct_func!(S,S_work);

So S,S_work = struct_func!(S,S_work); does do what I had wanted, as you suggest, thanks. Just to clarify, when you say “visible to the caller”, is the issue that the swap happened inside the function, so it was only local in scope? Would you have any other critiques or suggestions for accomplishing what I am trying to do here?

Sorry, I don’t really understand the question.

The swap happened inside the function, which means that you are in a new stackframe and cannot access the callers state. Assignments with naked left-hand-side in julia always manipulate memory in your stackframe (except for declared global), just like in C. No assignment you can possibly do can be visible once you return, because your stackframe does not exist anymore.

So, in order to be visible outside, any operation needs to do setindex! or setfield! or setproperty! or something like that on a heap-allocated mutable object, which has the cool syntactic sugar of a non-naked LHS, just like in C. Or you use return values.

Using return-values has the slight disadvantage of possibly causing allocations for the tuple (S, S_work), depending on how the inliner and optimizer feel today.

If you don’t get the tuple allocation then S, S_work = struct_func!(S,S_work); is imho the most julian way of writing this (and then just return S_work, S instead of swapping). If you get the tuple allocation and decide after profiling that you cannot afford it, then I would write struct_func!(S, S_work); S, S_work = S_work, S;, which works around the tuple allocation (alternatively yell at your code until the optimizer removes the tuple allocation; I only suffered from this issue on 0.6 because the optimizer improved so much).

Oh, I would also change the order of arguments: struct_func! mutates the second argument, and it is customary to give the thing that is mutated as first argument. That confused me first about your code.

This seems to work:

function struct_func!(S_work::DataStruct, S::DataStruct)
   n = length(S.X);

   for (j,k) in zip(1:n,n:-1:1)
      S_work.X[j] = sin(S.X[j] * (S.Y[k]));
      S_work.Y[j] = S.Y[j] - Int(j);
   end
   S_work, S
end

and when I perform:

n = 1001;
S = DataStruct{Float64, Int}(range(0,stop=1, length=n), 2*ones(n));
S_work = DataStruct{Float64, Int}(zeros(n), zeros(n));
@time S, S_work = struct_func!(S_work,S);

with different values of n, I’m only getting (7 allocations: 256 bytes) reported.

ADDED:
I just realized since I don’t actually care that the work space structure got changed (just that it didn’t generate allocations), this can further be simplified to just:

function struct_func!(S_work::DataStruct, S::DataStruct)
   n = length(S.X);

   for (j,k) in zip(1:n,n:-1:1)
      S_work.X[j] = sin(S.X[j] * (S.Y[k]));
      S_work.Y[j] = S.Y[j] - Int(j);
   end
   S_work
end

with evaluations of the type S = struct_func!(S_work,S); That also seems to run without generating allocations.

I thought I had this all resolved, but now I tried using this inside of a loop and ended up with a different error. For the following code:

struct DataStruct{TF<:AbstractFloat, TI<:Integer}
   X::Vector{TF}
   Y::Vector{TI}
end

function struct_func!(S_work::DataStruct, S::DataStruct)
   n = length(S.X);

   for (j,k) in zip(1:n,n:-1:1)
      S_work.X[j] = sin(S.X[j] * (S.Y[k]));
      S_work.Y[j] = S.Y[j] - Int(j);
   end
   S_work, S
end

S = DataStruct{Float64, Int}(range(0,stop=1, length=n), 2*ones(n));
S_work = deepcopy(S);

for j in 1:10
   S, S_work = struct_func!(S_work,S);
end

I had imagined that what this would do is apply the struct_func! transform at each iteration, essentially swapping the references between S and S_work so as to avoid doing new allocations. However, when I try this, I get the error:

ERROR: UndefVarError: S_work not defined

global in front of the assignment in the loop.

Indeed, that cleared out my error. Is there a way to accomplish this computation without using globals, or is this a problem where the use of global would be considered acceptable? I know Julia is somewhat different in how it handles scope, but avoiding globals is something that was really drilled into me over the years.

julia> fcall(f, args...)=f(args...);

julia> S,S_work = fcall(S,S_work) do S,S_work
       for j in 1:10
          S, S_work = struct_func!(S_work,S);
       end
       S,S_work
       end

You had a top-level loop, and the scoping of top-level loops is slightly counter-intuitive.

If it is not worth putting into a function (want to call only once) but worth compiling (speed or clearer scoping rules), then I tend to just put a do block around my code.

I don’t know why there is no syntactic sugar or base export for “just call the damn thing”. The fcall is a single line, but I don’t really like this idiom due to it being possibly confusing for other people (mental overhead of looking up the fcall definition).

Your S and S_work are already global variables in the snippet that you showed; they’re not contained inside a function or local to a let block. If you were to put this code inside a function, there would be no error. At the command line, where they are declared in global scope, the global annotation in the loop is necessary to tell the compiler that the variable names refer to the S and S_work in global scope, not create new variables local to the loop, which is the default behavior at global scope only.

Note that “global scope” includes at the top level of a module; all modules have their own “global” scope.

So the following two versions both do what I had expected. I’m originally from the MATLAB world where it would be quite natural to have the essential loop inside of scripting file that you would call. I guess the paradigm for Julia is to put everything inside of functions?

struct DataStruct{TF<:AbstractFloat, TI<:Integer}
   X::Vector{TF}
   Y::Vector{TI}
end

function struct_func!(D_work::DataStruct, D::DataStruct)
   n = length(D.X);

   for (j,k) in zip(1:n,n:-1:1)
      D_work.X[j] = sin(D.X[j] * (D.Y[k]));
      D_work.Y[j] = D.Y[j] - Int(j);
   end
   D_work, D
end

function iterate_func!(D::DataStruct,D_work::DataStruct, n)

   for j in 1:n
      D, D_work = struct_func!(D_work,D);
   end
   D, D_work
end

m = 5
S = DataStruct{Float64, Int}(range(0,stop=1, length=m), 2*ones(m));
S_work = deepcopy(S);
T = deepcopy(S);
T_work = deepcopy(S);

N = 4;

for j in 1:N
   global S, S_work = struct_func!(S_work,S);
end
iterate_func!(T, T_work, N);

Yes, that’s the official recommendation. Everything outside of functions runs in the interpreter and is not compiled. This does not matter for 10 elements, but becomes relevant for larger loops.

But I get you, doing non-perf critical work in the top-level scope is so tempting and sometimes more readable. Your decision to make, matter of taste.

An additional comments/question: My above solution fails when N=1, in that the iterate_func! does not perform as expected. The loop with global continues to work.