Reading .csv.gz with CSV does not find readavailable(::GZipStream)

I had an original piece of code like this, which worked nicely:

        filepaths = [joinpath(root, f)
                    for (root, dirs, files) in walkdir(root)
                    for f in files[occursin.(fnfeature, files) .& occursin.(r"csv$", files)]]
        df = let OptFloat64=Union{Missing, Float64}, OptInt32=Union{Missing, Int32}
            reduce(vcat, [,
                        header=[:domain, :host, :feature, :oid, :largeversion, :clientid,
                                :from, :to, :aggrlevel, :firstocc, :lastocc, :livesuntil,
                                :ct, :sum, :min, :max, :g_lower, :g_upper, :g_ct, :g_sum],
                        types=[String, String, String, Int64, String, String,
                                DateTime, DateTime, Int8, DateTime, DateTime, DateTime,
                                Int32, Float64, Float64, Float64, OptFloat64, OptFloat64, OptInt32, OptFloat64],
                        for fp in filepaths]) |> DataFrame

For reading csv.gz instead, using kmundnic’s suggestion at stackoverflow, I rewrote this (so that I’d not have to learn CSVFiles …) as

        filepaths = [joinpath(root, f)
                    for (root, dirs, files) in walkdir(root)
                    for f in files[occursin.(fnfeature, files) .& occursin.(r"csv.gz$", files)]]
        df = let OptFloat64=Union{Missing, Float64}, OptInt32=Union{Missing, Int32}
            reduce(vcat, [, "r") do io,
				 header=[:domain, :host, :feature, :oid, :largeversion, :clientid,
				         :from, :to, :aggrlevel, :firstocc, :lastocc, :livesuntil,
					 :ct, :sum, :min, :max, :g_lower, :g_upper, :g_ct, :g_sum],
				 types=[String, String, String, Int64, String, String,
					 DateTime, DateTime, Int8, DateTime, DateTime, DateTime,
					 Int32, Float64, Float64, Float64, OptFloat64, OptFloat64, OptInt32, OptFloat64],
                        for fp in filepaths]) |> DataFrame

However, with this I get

ERROR: LoadError: MethodError: no method matching readavailable(::GZipStream)
Closest candidates are:
  readavailable(::Base.Filesystem.File) at filesystem.jl:199
  readavailable(::IOStream) at iostream.jl:396
  readavailable(::Base.AbstractPipe) at io.jl:243
 [1] write(::Base.GenericIOBuffer{Array{UInt8,1}}, ::GZipStream) at .\io.jl:579

:frowning: - what’s it that I don’t understand? Thanks for help!

// That OptFloat thing is unnecessary, isn’t it? - as the DataValues behind a DataFrame handle “empty values” anyway … But so be it, for the moment …

The problem is that the Gzip.jl package doesn’t properly implement the IO interface from Base and has received very little maintenance over the last few years. I’d recommend using instead, which is actively maintained and includes the proper interfaces for CSV.jl.


CodecZlib.jl is also much faster than Gzip.jl. I use CodecZlib in production and am very happy with it.


Many thanks - I’ll try it tomorrow!

… and I’m happy now, too. Thanks!