How to migrate version of PooledArrays inside JLD2 file

bluesmoon · April 1, 2021, 3:10pm

A while ago I created a JLD2 file with a DataFrame in it. (This is used for unit testing)

One of the columns in the DataFrame has a type of PooledArrays.PooledArray{String,UInt32,1,Array{UInt32,1}}

When I created the JLD2, the version of PooledArrays used was 1.1.0.
Through periodic updates, I now have PooledArrays 1.2.1, and now I can no longer open my JLD2 because of this error:

┌ Warning: saved type PooledArrays.PooledArray{String,UInt32,1,Array{UInt32,1}} is missing field refcount in workspace type; reconstructing
└ @ JLD2 ~/.julia/packages/JLD2/qncOK/src/data/reconstructing_datatypes.jl:152
Error encountered while load FileIO.File{FileIO.DataFormat{:JLD2},String}(".../test/results.jld2").

Fatal error:
Bucket: Error During Test at .../test/runtests.jl:283
  Got exception outside of a @test
  JLD2 load error: neither load nor fileio_load is defined
    due to MethodError(convert, (AbstractArray{T,1} where T, JLD2.ReconstructedTypes.var"##PooledArrays.PooledArray{String,UInt32,1,Array{UInt32,1}}#259"(UInt32[0x00000001], ["[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,1,0,0,0,0,3,4,0,1,1,5,16,35,50,55,62,54,45,66,85,112,171,181,270,334,365,403,369,342,318,301,303,353,304,484,714,675,646,552,522,650,551,526,446,412,367,328,274,283,274,260,238,219,251,240,201,171,158,313,312,250,193,135,116,96,146,118,77,80,1589,162,81,20,4,8]"], Dict{String,UInt32}("[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,1,0,0,0,0,3,4,0,1,1,5,16,35,50,55,62,54,45,66,85,112,171,181,270,334,365,403,369,342,318,301,303,353,304,484,714,675,646,552,522,650,551,526,446,412,367,328,274,283,274,260,238,219,251,240,201,171,158,313,312,250,193,135,116,96,146,118,77,80,1589,162,81,20,4,8]" => 0x00000001))), 0x0000000000006d22)
    Will try next loader.

I can keep running my tests by forcing my package to always use PooledArrays 1.1.0 (even though it wasn’t part of my Project.toml), but I don’t know how I can migrate the version in the jld2 to the new version.

Any ideas?

TIA,
Philip

affans · April 1, 2021, 3:29pm

Is it possible for you to create a new environment with the correct package versions, extract the jld2 file, and save the dataframe as a text file? You can read this dataframe back with the correct package versions and create a new jld2 file.

bluesmoon · April 1, 2021, 3:35pm

unfortunately converting to text or even json loses a lot of type information. I’m going to see if I can just convert it to a standard Array.

bluesmoon · April 1, 2021, 4:01pm

Ok, so I came up with this solution that gets rid of the PooledArrays dependency altogether:

using FileIO, DataFrames, PooledArrays, SentinelArrays

if length(ARGS) == 0
    println("Usage: convert_pooled_arrays.jl <jld file>...")
    exit(-1)
end

for f in ARGS
    expec = load(f)
    changed = false
    for k in keys(expec)
        if isa(expec[k], AbstractDataFrame)
            for col in names(expec[k])
                if isa(expec[k][!, col], PooledArrays.PooledArray)
                    println(f, ": ", k, ".", col)
                    expec[k][!, col] = Vector{eltype(expec[k][!, col])}(expec[k][!, col])
                    changed = true
                end
            end
        end
    end
    if changed
        save(f, expec; compress=true)
    end
end

Topic		Replies	Views
Reading DataFrames from JLD2 files General Usage question , jld2 , dataframes	9	1734	January 16, 2023
Way to migrate jld2 data to have new fields without corruption? General Usage question	3	1225	September 22, 2018
Question about JLD2 save/load of RData nested list General Usage question	3	727	June 25, 2019
CSV.read: why do String columns show up as PooledArrays? New to Julia question	6	1105	October 30, 2019
How to load a "jld" file at julia 1.0.3? General Usage question	3	2009	February 28, 2019

How to migrate version of PooledArrays inside JLD2 file

Related topics