JLD load error for Dict

I am using dictionaries to store matrices of data under a series of tags such as the following:

"GroupIdentifier"=>Dict{Any,Any}("SubjectNumber"=>Dict{Any,Any}("StimulusType"=>Dict{Any,Any}("RecordingPlace"=>([dat1],[dat2]))))

I can save this data full Dict as a .jld file but when I try to load it, I get the following error message:

ERROR: stored type JLD.AssociativeWrapper{Core.Any,Core.Any,Base.Dict{Core.Any,Core.Any}} does not match currently loaded type

I tried to recreate this error in an example code for you all but when I made a small/simple Dict, save and load from the JLD package just worked.

As the data are matrices, I don’t think I can use DataFrames or any tables. I also think Arrow won’t work as it would try to put each value in it’s own column and row instead of inserting the full matrix into a cell.

Can someone help me decipher what this error message might mean and/or point me towards a better way of either storing or saving my data? Thanks!

Instead of using a nested dict, could you flatten the data structure, e.g.

Dict((GroupID=3, SubjectNumber=201, StimulusType=:green) => (rand(3), rand(3))

That allows you to store it in a DataFrame and should make the types concrete for easier serialization/deserialization.

Thanks for your response, I need to be able to chunk the data in different ways for analysis as I go on though—isn’t there a way to simply save the Dict as is with JLD or is there another structure that would work better initially? As far as I can tell, the series of info tags wouldn’t be a problem in a DataFrame or whatnot but what would be a problem is a dataset that takes more than one column and row.

It does seem that JLD2 and FileIO solves the problem though as the Dict looks the same going in and coming out with

using JLD2, FileIO
save("datafile.jld2","variname",Dictionary)
Variname = load("datafile.jld2")["variname"]

I’m not sure what you mean by “a dataset that takes more than one column and row”. Could you give an example?

I got a similar error yesterday when using JLD to save a DataFrame object. The weird thing is that this used to work just fine a few weeks ago.

Here’s a short reproducible example where I get an error:

using DataFrames
using JLD

ex_df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])
save("ex-file.jld", "ex_df", ex_df)

d = load("ex-file.jld")

I get an error message, which seems very similar to the error that @katdeane got regarding a stored type not matching the currently loaded type:

ERROR: stored type DataFrames.DataFrame does not match currently loaded type

What is weird is that I can get this error to happen within one REPL session so I’m not sure how to understand this error. I am not familiar with the inner workings of JLD, but I’m guessing that it must be a problem in a package update? Does anyone with more knowledge of JLD have a sense of what’s going on here? Maybe @tim.holy would have a better sense of this?

I should point out that using JLD2 and FileIO as @katdeane as suggested above works fine and I do not receive the same error. So, I might just switch over to using JLD2. But, it would be good to know where this error is coming from and what to do.

2 Likes

I’m coming from Matlab (trying to get away from it) and might be thinking in terms of the structures I used there. I am also not a vet programmer and use this for neuro data analysis so I’m sorry for lack of clarity.

In Matlab I could create a struct where 1 column would hold cells of matrices of data in relation to the other identifier tags. If I try to export that as a csv for example, the formatting takes the matrix in the cell and spreads it out over rows and columns. So if I have Data(1).Stimtype = “green” and Data(1).Dat1 = {20 x 800 double} then changing it to a table or csv format would make Dat1 take 20 rows and 800 columns instead of just 1x1. Let me know if I’m just completely missing something, anything to simplify storing and chunking data is appreciated.

I am still not sure I understand the data structure (matrices in nested dictionaries?), but perhaps JSON could work.

Providing an MWE that generates example data would make it easier to suggest a solution.

Of course, my bad. I tried and couldn’t recreate the error with an MWE before but I was able to finally do so when I went further to also recreate the data structure setup for you now.

GroupID = ["1" "2"]
Subject = ["001" "002" "003"]
StimList = ["green" "blue" "red"]

Group = Dict()
for iGr = 1:length(GroupID)
    SubjectNumber = Dict()
    for iSu = 1:length(Subject)
        Stimtype = Dict()
        for iSt = 1:length(StimList)
            dat1 = rand(2,4)
            dat2 = rand(2,4)
            Stimtype[StimList[iSt]] = dat1, dat2
        end
        SubjectNumber[Subject[iSu]] = Stimtype
    end
    Group[GroupID[iGr]] = SubjectNumber
end

using JLD, HDF5
save("Group.jld",Group)
loadGroup = load("Group.jld")
1 Like

Thanks for the MWE. I found that JLD2 writes it out, but you may want to test it.

For this kind of data, I would consider just using the filesystem + CSV, eg save a table in 1/001/green.csv etc in some subdirectory.

1 Like

Thanks for the replies, @Tamas_Papp. Yes, it seems that the errors that @katdeane and I are having are fixed with JLD2.

I am still confused about the error message I’m getting from JLD when trying to save and load a DataFrame. See my reply above for a very short MWE. The weird thing is that I can get this error to occur in a single REPL session. Any thoughts on this?

(P.S. sorry if I’m hijacking this thread too much. I can open a new thread or go to the JLD github page if that’s easier!)

1 Like

No worries @charshaw, I was also saving and loading in the same REPL session and the error message totally confused me. How could it not match a loaded type when it was sitting in my workspace? My issue is solved but I am still curious about how to interpret this.