Hello,
I am trying to load python pickle object. I keep getting the below error message and I am unable to resolve the issue. The data which i am trying to load is CIFAR10 Dataset. Below is my code with which i am trying to load datasets.
using PyCall
@pyimport pickle
function load_pickle_data(ROOT)
datadict = Dict()
for b=1:5
f=joinpath(ROOT, "data_batch_$b")
fo=open(f,"r")
datadict=pickle.load(fo)
end
datadict
end
ERROR
PyError (ccall(@pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, arg, C_NULL)) <type 'exceptions.TypeError'>
TypeError("unhashable type: 'bytearray'",)
File "/Users/Saran/.julia/v0.6/Conda/deps/usr/lib/python2.7/pickle.py", line 1384, in load
return Unpickler(file).load()
File "/Users/Saran/.julia/v0.6/Conda/deps/usr/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
cifar-10-batches-py
Directory has the following files in it
batches.meta
data_batch_1
data_batch_2
data_batch_3
data_batch_4
data_batch_5
readme.html
test_batch
cifar-10-batches-py directory and the Julia file which i am running are in the same folder. Kindly help me out in fixing this issue.
Thank You
CIFAR comes in Binary version.
This really should be preferred over Pickle anyway.
Pickle is insecure and can run malicious code.
(Eg if the authors site had been hacked)
The binary version is easily parse-able using julia
The problem is that Julia UInt8
arrays are by default converted to Python bytearray
, whereas pickle only allows bytes
objects for some reason (hence the unhashable type: 'bytearray'
error). You can do:
datadict = pickle.loads(pybytes(readbytes(fo)))
instead. (Note that your for b=1:5
loop overwrites datadict
5 times, so that you are only returning the last datadict. Maybe you want merge!(datadict, pickle.loads(...))
instead?)
See also https://github.com/JuliaPy/PyCall.jl/pull/388 for how PyCall uses pickle
for serialization.
@stevengj Thank You very much for pointing out the mistake i had to use merge!
I have made the changes which you suggested. But still i am getting error
using PyCall
@pyimport pickle
function load_pickle_data(ROOT)
datadict = Dict()
for b=1:5
f=joinpath(ROOT, "data_batch_$b")
fo=open(f,"r")
merge!(datadict,pickle.load(pybytes(readbytes(fo))))
end
datadict
end
UndefVarError: readbytes not defined
in load_pickle_data at loadbatchutil.jl:9
I am currently using Julia v"0.6.0". Hence i tried to change readbytes
to readbytes!
still get an error
merge!(datadict,pickle.load(pybytes(readbytes!(fo, UInt8))))
ERROR Msg
MethodError: no method matching readbytes!(::IOStream, ::Type{UInt8})
Closest candidates are:
readbytes!(::IOStream, !Matched::Array{UInt8,N} where N) at iostream.jl:278
readbytes!(::IOStream, !Matched::Array{UInt8,N} where N, !Matched::Any; all) at iostream.jl:278
readbytes!(::IO, !Matched::AbstractArray{UInt8,N} where N) at io.jl:503
...
in load_pickle_data at loadbatchutil.jl:9
Please let me know what i am missing here
Sorry, it is just read(fo)
in Julia 0.6.
@stevengj Tried read(fo)
. I still get error message.
merge!(datadict,pickle.load(pybytes(read(fo))))
PyError (ccall(@pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, arg, C_NULL)) <type 'exceptions.AttributeError'>
AttributeError("'str' object has no attribute 'readline'",)
File "/Users/Saran/.julia/v0.6/Conda/deps/usr/lib/python2.7/pickle.py", line 1384, in load
return Unpickler(file).load()
File "/Users/Saran/.julia/v0.6/Conda/deps/usr/lib/python2.7/pickle.py", line 847, in __init__
self.readline = file.readline
You have to use pickle.loads
, not pickle.load
, to read from a pybytes
object. pickle.load
only takes an I/O stream.
1 Like
@stevengj Thank You very much. pickle.loads
did sort the issue.
"batch_label" β "training batch 5 of 5"
"labels" β Any[10000]
"data" β 10000Γ3072 Array{UInt8,2}:
"filenames" β Any[10000]