Append!() with two dataframe throws PooledArray error

Hi everyone,

I’m very new to Julia and have been working through replicating a relatively simple R script in Julia 1.0.
Currently it’s a data cleaning exercise and I’ve been using DataFrames and DataFramesMeta, as I am most comfortable using dplyr type syntax.

I’ve read in some data from a csv file, executed a few melt, unstack, and filter type operations and now I have two dataframes that I want to row bind. I’ve tried the append! function but for some reason it won’t work. I can just use the standard [df1; df2] approach to concatenate, but I’d like to understand why the append! function fails.

I’ve ensured that the column types in both dataframes are identical, but I don’t know how to check for further properties of the dataframes that could be causing issues. If I was using R I’d suspect the issue was something like one of the dataframes being grouped, except I haven’t grouped either dataframe. I’m not sure of what pooling is exactly except that it appears it is used as a memory saving device when referencing strings in a dataframe. I don’t know why this would cause an issue with appending one dataframe to another.

Is anyone able to suggest how I could go about investigating this further? Is there a way to me to check if a dataframe is pooled?

I can’t share the data just yet as it is a bit sensitive and I haven’t had a chance to try to replicate the error.

Cheers

Jeff

This is a bug and will be fixed soon. See https://github.com/JuliaComputing/PooledArrays.jl/pull/23.

A temporary solution is to convert the offending columns into vectors:

for (name, col) in eachcol(df, true)
    col isa PooledArray && (df[name] = Vector(col))
end
1 Like

Thanks for that. Glad to know it’s a bug and there’s a fix planned. I tried this solution and got an error saying PooledArray not defined. I tried changing the “col isa PooledArray” to isa(col, PooledArray) , but got the same error. I’ll keep an eye out for the update.

Thanks so much for your help.

jeff

You have to install PooledArrays.jl first by using Pkg; Pkg.add("PooledArrays"); using PooledArrays.

I wondered if it might be something simple like that. I’ll give it a go.