Hmm… Looks like some major optimizations are missing here. If you want to help making progress, it would be useful to check whether you can reproduce this without DataFrames, i.e. by calling append!
directly on vectors, comparing Vector{T}
, DataVector
{T} with Vector{Union{T, Missing}}
. Then it would be worth filing an issue in Julia.
If you need to filter a data set, maybe you can fill a boolean vector indicating whether a row should be kept or not, and then call getindex
on the data frame with it? That would be dramatically faster. There’s also an (unoptimized as the moment) filter
function in the latest DataFrames version which could be used to get the same result.