How to initialize empty dataframe of specified size

bkamins · August 31, 2021, 12:50pm

What I mean is that it is better to populate the vectors first, and then create a data frame from them, e.g.:

julia> using DataFrames, BenchmarkTools

julia> 

julia> function test1()
           df = DataFrame(a=Vector{Int}(undef, 10^6),
                          b=Vector{String}(undef, 10^6),
                          c=Vector{Int}(undef, 10^6), copycols=false)
           for i in 1:10^6
               df.a[i] = 1
               df.b[i] = "1"
               df.c[i] = 1.0
           end
           return df
       end
test1 (generic function with 1 method)

julia> 

julia> function test2()
           nt = (a=Vector{Int}(undef, 10^6),
                 b=Vector{String}(undef, 10^6),
                 c=Vector{Int}(undef, 10^6))
           for i in 1:10^6
               nt.a[i] = 1
               nt.b[i] = "1"
               nt.c[i] = 1.0
           end
           return DataFrame(nt, copycols=false)
       end
test2 (generic function with 1 method)

julia> 

julia> @btime test1();
  244.934 ms (5998506 allocations: 160.20 MiB)

julia> @btime test2();
  4.881 ms (27 allocations: 22.89 MiB)

Since DataFrame object is not type stable it is best suited for operations that work on whole-columns, as then type instability is not an issue.

The benefit of not being type stable is that we can accommodate very wide data frames without huge compilation overhead + you can easily change the schema of a DataFrame.

Topic		Replies	Views
Sequentially add data to a DataFrame New to Julia question , dataframes	4	779	January 9, 2025
Construct Julia Dataframe from row data New to Julia question , dataframes , data_structures	11	6212	March 21, 2020
Create dataframe with n columns of strings General Usage	6	1968	February 4, 2021
Initializing a dataframe New to Julia	23	10851	March 15, 2020
How to create a DataFrame with specific number of columns and rows and fill it with zeros? General Usage dataframes	7	1140	February 22, 2024

How to initialize empty dataframe of specified size

Related topics