DataFrame Matrix Constructor without Copying

bmit · November 22, 2021, 9:42pm

I’m curious why the following method for creating a DataFrame using an existing matrix doesn’t exist. In some ways it seems like the simplest way to create a DataFrame.

using DataFrames
mat = rand(10000,3)
df = DataFrame(mat, [:a, :b, :c], copycols=false)

I have a large matrix and I’d like to use a DataFrame to label and operate on the columns inplace - either to avoid an expensive copy or just to make modifying the original matrix more convenient through the DataFrame API.

Btw, the following workaround seems to save the copy, but I’m wondering if there’s something I’m missing:

matcols = collect(eachcol(mat))
df = DataFrame(matcols, [:a, :b, :c], copycols=false);

pdeffebach · November 23, 2021, 2:23am

In Julia a Matrix is not just a vector of vectors. They have different memory layouts, which means you have to copy.

sijo · November 23, 2021, 8:19am

This is actually working in the development version, it will be released in DataFrames 1.3…

@pdeffebach I think the memory layout is not a problem? Since Matrix is column-major, it is straightforward to have vectors that are views of the same data as the matrix.

bkamins · November 23, 2021, 8:42am

This will work in DataFrames 1.3 as @sijo explained.

However, one has to bear in mind that such an approach ha a limitation that many standard functions like push! or append! will not work correctly with such a data frame.

bmit · November 23, 2021, 3:33pm

This is great! Thanks @bkamins and @sijo. The memory issue didn’t seem like it would be a problem since a dense Matrix is just a more memory layout constrained vector of vectors.

push! and append! are the sort of gotchas I was trying to think of with my workaround above. Not that I would use them in my use case, but I see how that could create major headaches in general. Could those two particular functions dispatch to vcat for the Matrix case?

bkamins · November 23, 2021, 3:59pm

No, as it would break the contract for push! and append! in Julia Base.

Juan · November 23, 2021, 6:08pm

Why wouldn’t push! and append! work properly?

bkamins · November 23, 2021, 6:21pm

Because we would create a new vector and push! and append! are functions that update existing vectors in-place.

Topic		Replies	Views
Copy vs view of DataFrame column? General Usage dataframes	16	888	January 19, 2023
Indexing DataFrame with : does not generate a copy Specific Domains dataframes	2	767	March 17, 2022
Matrix to dataframe conversion with structure of existing dataframe Data dataframes , convert	4	89	May 28, 2025
question about creating new columns in data frame from existing columns, New to Julia question	4	1719	June 26, 2018
Make a Copy of a DataFrame Row General Usage dataframes	9	1190	February 2, 2023

DataFrame Matrix Constructor without Copying

Related topics