Mutating version of vcat for data frames

Hello, I need a mutating version of vcat for dataframes. This is because I must do the concatenation inside a mutating function so as not to define new temporary data frames.
This is an example of how I would like to use it, the dataframe I want to mutate is called df_streams.

"""Single stream match with the catalog."""
function match_stream_catalog!(stream::Py, df_cat::DataFrame, df_streams::DataFrame)
    window = [[minimum(stream.track.ra.deg), maximum(stream.track.ra.deg)],
              [minimum(stream.track.dec.deg), maximum(stream.track.dec.deg)]]
    window = pyconvert(Vector{Vector{Float64}}, window)
    stream_name = pyconvert(String, stream.stream_name)
    field = get_field(window, df_cat)
    field.track = Vector{String}(repeat([stream_name], size(field,1)))
    ra = Py(field.RA)
    dec = Py(field.DEC)
    field_coord = ac.SkyCoord(ra=ra*u.deg, dec=dec*u.deg, frame="icrs")
    ontrack = stream.get_mask_in_poly_footprint(field_coord)
    vcat!(df_streams, field[pyconvert(BitVector,ontrack),:])
    return nothing
end

Thank you very much.

Update: is the solution append! ? And for hcat ?

To append individual rows to an existing DataFrame, you can use push! (or append! for multiple rows, but that would require allocating temporaries). For hcat, it’s exactly that: hcat(df1, df2), or hcat(df1, df1, copycals=false) for low-allocation concatenation.

1 Like

Thank you! I must concatenate two dataframes. I tried with hcat(df1, df2, copycals=false) but it doesn’t mutate df1 nor df2.
If I use append!(df1,df2), does it use temporary dataframe as if I did:
df3=hcat(df1,df2) or its something different ?
I would like to use as few RAM as possible.
Thanks.

A DataFrame is (more or less) just a list of named vectors, and hcat(..., copycols=false) doesn’t copy the underlying vectors:

julia> df1, df2 = DataFrame(a = rand(10^6)), DataFrame(b = 1:10^6);

julia> @time df1 = hcat(df1, df2, copycols=false);
  0.000019 seconds (18 allocations: 1.406 KiB)
1 Like

If you want to vertically append in-place use append! or prepend! (they are equivalents of vcat that allocates a new data frame).

If you want to horizontally add columns in-place to a data frame use insertcols!. Here is an example:

julia> df = DataFrame(a=1:2, b=3:4)
2Γ—2 DataFrame
 Row β”‚ a      b
     β”‚ Int64  Int64
─────┼──────────────
   1 β”‚     1      3
   2 β”‚     2      4

julia> df2 = DataFrame(c=11:12, d=13:14)
2Γ—2 DataFrame
 Row β”‚ c      d
     β”‚ Int64  Int64
─────┼──────────────
   1 β”‚    11     13
   2 β”‚    12     14

julia> insertcols!(df, pairs(eachcol(df2))...)
2Γ—4 DataFrame
 Row β”‚ a      b      c      d
     β”‚ Int64  Int64  Int64  Int64
─────┼────────────────────────────
   1 β”‚     1      3     11     13
   2 β”‚     2      4     12     14

Use copycols to decide if df should alias or copy columns from df2. By default it will copy.

3 Likes

There is also a discussion to expose hcat! in this issue, but currently no one really needed it often so we have not done it. (the hcat! function is available internally, but is not part of public API).

1 Like

Thanks, now I understood how to use copycols=false with df1=hcat(df1,...).

Thank you very much for this explanation. I will use append!(df1,df2).