Change struct dataframe value correct way

Hello I have a struct with inside other two structs:

Base.@kwdef mutable struct MyDataset
  input::MyInputDataset = MyInputDataset()
  processed::MyInputDataset = MyInputDataset()
end

Base.@kwdef mutable struct MyInputDataset 
  first_st::FirstStruct = FirstStruct()
  second_st::SecondStruct = SecondStruct()
  df1::DataFrame = DataFrame()
  df2::DataFrame = DataFrame()
end

Now I instantiate the df1 dataframe inside the input struct and let the same dataframe from the processed struct to point to it

julia> dataset.input.df1 = DataFrame("w"=>2)
1×1 DataFrame
 Row │ w     
     │ Int64
─────┼───────
   1 │     2

julia> dataset.processed.df1 = dataset.input.df1 
1×1 DataFrame
 Row │ w     
     │ Int64
─────┼───────
   1 │     2

After that if I want to change the dataframe of the processed it will change also the value of the input’s dataframe

julia> dataset.processed.df1 = DataFrame("r"=>2)
1×1 DataFrame
 Row │ r     
     │ Int64
─────┼───────
   1 │     2

julia> dataset.input.df1 
1×1 DataFrame
 Row │ r     
     │ Int64
─────┼───────
   1 │     2

Could anyone explain me why and/or help me?

Thank you

When you assign dataset.processed.df1 = dataset.input.df1 , you’re not creating a new dataframe with the same contents, instead you’re making processed.df1 point to the same DataFrame as input.df1. This is generally how Julia works: assignment does not create a new copy.

If you want it to be a new DF, create a new dataframe with the copy method, for eg. dataset.processed.df1 = copy(dataset.input.df1; copycols=true) .

Welcome to the Julia community!

If I understand correctly, you’re showing desired behaviour here, i.e. you’d want dataset.input.df1 to be as you printed (which is not how this currently works)? If so, then the situation is that you here reassign dataset.processed.df1 to point to a new DataFrame (with "r"), while dataset.input.df1 still refers to the old one (with "w"). Instead, you need to modify the original DataFrame inplace (using ! methods). E.g.

julia> dataset.processed.df1 = dataset.input.df1 = DataFrame("w"=>2);

julia> rename!(dataset.processed.df1, :w => :r)  # Change the content of dataset.processed.df1, but don't reassign the variable
1×1 DataFrame
 Row │ r
     │ Int64
─────┼───────
   1 │     2

julia> dataset.input.df1
1×1 DataFrame
 Row │ r
     │ Int64
─────┼───────
   1 │     2

Alternatively, and for the same reasons, you could work with Refs / Base.RefValues.

Thank you for the responses.

My idea is, at the beginning have the processed.df1 to point to the same DataFrame as input.df1 but at certain point into the time I want to let the processed.df1 to point a new DataFrame leaveing the input.df1 to point to the original Dataframe.

Well, this is already what your original code does?

julia> dataset.input.df1 = DataFrame("w"=>2)
1×1 DataFrame
 Row │ w
     │ Int64
─────┼───────
   1 │     2

julia> dataset.processed.df1 = dataset.input.df1
1×1 DataFrame
 Row │ w
     │ Int64
─────┼───────
   1 │     2

julia> dataset.processed.df1 = DataFrame("r"=>2)
1×1 DataFrame
 Row │ r
     │ Int64
─────┼───────
   1 │     2

julia> dataset.input.df1
1×1 DataFrame
 Row │ w
     │ Int64
─────┼───────
   1 │     2

(@digital_carver 's reply is for when you want to decouple the DataFrames while keeping the contents identical (for now), while mine was for keeping them coupled, while altering the common content.)

No because what I’m trying to achieve is to change only the dataset.processed.df1 Dataframe without change also the dataset.input.df1.
In other words at the begin I want that both df1 dataframes point to the same memory location, so if I “query” processed.df1 or input.df1 I will get the same result. But at some point I will change the value only at processed.df1 to point to a new memory location with a new DataFrame.

This request is because I need to get the initial data (dataset.input) “clean” to let the program to perform various run and every run will modify only the dataset.processed without perform a copy() or a deepcopy() to keep the RAM footprint as small as possible.

Thank you again

It really does sound like it’s already working as intended.

julia> dataset.input.df1 = dataset.processed.df1 = DataFrame("w"=>2);

julia> dataset.input.df1 == dataset.processed.df1
true

julia> dataset.processed.df1 = DataFrame("r"=>2);

julia> dataset.input.df1 == dataset.processed.df1
false
julia> dataset.input.df1 = dataset.processed.df1 = DataFrame("w"=>2);

julia> pointer_from_objref(dataset.input.df1)
Ptr{Nothing} @0x0000025ca5350c50

julia> pointer_from_objref(dataset.processed.df1)  # same memory location
Ptr{Nothing} @0x0000025ca5350c50

julia> dataset.processed.df1 = DataFrame("r"=>2);  # (or even DataFrame("w"=>2) )

julia> pointer_from_objref(dataset.input.df1)  # has not changed
Ptr{Nothing} @0x0000025ca5350c50

julia> pointer_from_objref(dataset.processed.df1)  # different
Ptr{Nothing} @0x0000025ca5351b50

I’m not sure I fully understand, but presumably you’d only need a single copy to create dataset.processed.df1 from dataset.input.df1, after which you can just keep modifying dataset.processed.df1 inplace? I don’t see how you could get a smaller memory footprint while keeping the df1s separate.