Parquet2jl: ThriftInvalidFinalReadState: while attempting to read Parquet2.Metadata.Statistics

Got this error when trying to open parquet files converted from arrow and compressed with snappy by js library called parquet-wasm. Opening with pyarrow or r-arrow works fine. Parquet2jl version 0.2.24 (also earlier versions do not work too). Julia 1.10.0.

ThriftInvalidFinalReadState: while attempting to read Parquet2.Metadata.Statistics, final index state was 7, expect 0.
Stacktrace:
  [1] readshort(p::Thrift2.CompactProtocol{IOBuffer}, ::Type{Thrift2.ThriftStruct{Parquet2.Metadata.Statistics}}, k::Int64, t::Int64)
    @ Parquet2.Metadata C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\codegen.jl:22
  [2] read(p::Thrift2.CompactProtocol{IOBuffer}, ::Type{Thrift2.ThriftStruct{Parquet2.Metadata.Statistics}})
    @ Parquet2.Metadata C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\codegen.jl:78
  [3] read
    @ Thrift2 C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\read.jl:29 [inlined]
  [4] _readfield
    @ Thrift2 C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\read.jl:140 [inlined]
  [5] readshortfield
    @ Thrift2 C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\read.jl:158 [inlined]
  [6] readshort(p::Thrift2.CompactProtocol{IOBuffer}, ::Type{Thrift2.ThriftStruct{Parquet2.Metadata.ColumnMetaData}}, k::Int64, t::Int64)
    @ Parquet2.Metadata C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\codegen.jl:15
  [7] read(p::Thrift2.CompactProtocol{IOBuffer}, ::Type{Thrift2.ThriftStruct{Parquet2.Metadata.ColumnMetaData}})
    @ Parquet2.Metadata C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\codegen.jl:78
  [8] read
    @ Thrift2 C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\read.jl:29 [inlined]
  [9] _readfield
    @ Thrift2 C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\read.jl:140 [inlined]
 [10] readshortfield
    @ Thrift2 C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\read.jl:158 [inlined]
 [11] readshort(p::Thrift2.CompactProtocol{IOBuffer}, ::Type{Thrift2.ThriftStruct{Parquet2.Metadata.Column}}, k::Int64, t::Int64)
    @ Parquet2.Metadata C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\codegen.jl:15
 [12] read(p::Thrift2.CompactProtocol{IOBuffer}, ::Type{Thrift2.ThriftStruct{Parquet2.Metadata.Column}})
    @ Parquet2.Metadata C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\codegen.jl:78
 [13] _read_setlike_elements(p::Thrift2.CompactProtocol{IOBuffer}, s::Int32, ::Type{Thrift2.ThriftStruct{Parquet2.Metadata.Column}})
    @ Thrift2 C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\read.jl:44
 [14] read(p::Thrift2.CompactProtocol{IOBuffer}, ::Type{Thrift2.ThriftList{Thrift2.ThriftStruct{Parquet2.Metadata.Column}}})
    @ Thrift2 C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\read.jl:63
 [15] read
    @ Thrift2 C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\read.jl:29 [inlined]
 [16] _readfield
    @ Thrift2 C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\read.jl:140 [inlined]
 [17] readshortfield
    @ Thrift2 C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\read.jl:158 [inlined]
 [18] readshort(p::Thrift2.CompactProtocol{IOBuffer}, ::Type{Thrift2.ThriftStruct{Parquet2.Metadata.RowGroup}}, k::Int64, t::Int64)
    @ Parquet2.Metadata C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\codegen.jl:15
 [19] read(p::Thrift2.CompactProtocol{IOBuffer}, ::Type{Thrift2.ThriftStruct{Parquet2.Metadata.RowGroup}})
    @ Parquet2.Metadata C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\codegen.jl:78
 [20] _read_setlike_elements(p::Thrift2.CompactProtocol{IOBuffer}, s::UInt8, ::Type{Thrift2.ThriftStruct{Parquet2.Metadata.RowGroup}})
    @ Thrift2 C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\read.jl:44
 [21] read(p::Thrift2.CompactProtocol{IOBuffer}, ::Type{Thrift2.ThriftList{Thrift2.ThriftStruct{Parquet2.Metadata.RowGroup}}})
    @ Thrift2 C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\read.jl:63
 [22] read
    @ Thrift2 C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\read.jl:29 [inlined]
 [23] _readfield
    @ Thrift2 C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\read.jl:140 [inlined]
 [24] readshortfield
    @ Thrift2 C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\read.jl:158 [inlined]
 [25] readshort(p::Thrift2.CompactProtocol{IOBuffer}, ::Type{Thrift2.ThriftStruct{Parquet2.Metadata.FileMetaData}}, k::Int64, t::Int64)
    @ Parquet2.Metadata C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\codegen.jl:15
 [26] read(p::Thrift2.CompactProtocol{IOBuffer}, ::Type{Thrift2.ThriftStruct{Parquet2.Metadata.FileMetaData}})
    @ Parquet2.Metadata C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\codegen.jl:78
 [27] read
    @ Parquet2.Metadata C:\Users\pavlo\.julia\packages\Thrift2\HMxFD\src\codegen.jl:80 [inlined]
 [28] readmeta(v::Vector{UInt8}; check::Bool)
    @ Parquet2 C:\Users\pavlo\.julia\packages\Parquet2\JMLqR\src\dataset.jl:289
 [29] readmeta
    @ Parquet2 C:\Users\pavlo\.julia\packages\Parquet2\JMLqR\src\dataset.jl:283 [inlined]
 [30] Dataset(fm::Parquet2.FileManager{FilePathsBase.WindowsPath})
    @ Parquet2 C:\Users\pavlo\.julia\packages\Parquet2\JMLqR\src\dataset.jl:98
 [31] Dataset(p::FilePathsBase.WindowsPath; kw::@Kwargs{})
    @ Parquet2 C:\Users\pavlo\.julia\packages\Parquet2\JMLqR\src\dataset.jl:112
 [32] Dataset(p::FilePathsBase.WindowsPath)
    @ Parquet2 C:\Users\pavlo\.julia\packages\Parquet2\JMLqR\src\dataset.jl:110
 [33] Dataset(p::String; kw::@Kwargs{})
    @ Parquet2 C:\Users\pavlo\.julia\packages\Parquet2\JMLqR\src\dataset.jl:116
 [34] top-level scope

I can share the sample parquet file, but it’s not possible to include it.

Tagging @ExpandingMan who is the maintainer

1 Like

If you can open an issue in the repo with the sample I can take a look.

Thanks, I have opened issue in the repo

Thanks. It appears there was a metadata update last November that I never included. I’m pretty sure my thrift implementation is not supposed to break when that happens, so it probably has a bug, but it wasn’t clear to me how to fix it and I haven’t tried that hard yet. I did of course update the parquet metadata schema though, so it’s fixed in parquet.

Anyway, you probably saw this relatively early because sounds like you are using a newer writer that probably got updated relatively early. Regardless, if anyone else was seeing similar issues, they should be resolved for you as well as soon as 0.2.25 is tagged.

1 Like