Importing ZipFile.ReadableFile into DataFrame

I have used ZipFile to open a downloaded compound zipfile. I want to import several of the contained files into DataFrames.

ZipFile.Reader for IOStream(<file ./GBIF-Datasets/0333654-200613084148143.zip>) containing 16 files:

uncompressedsize method  mtime            name
----------------------------------------------
        39310607 Deflate 2021-07-29 01-56 occurrence.txt
        28756841 Deflate 2021-07-29 01-56 verbatim.txt
        16349369 Deflate 2021-07-29 01-56 multimedia.txt
            1641 Deflate 2021-07-29 01-56 citations.txt
            2947 Deflate 2021-07-29 01-56 dataset/1bc719fd-c4e1-410f-b8c1-518cc1addcb5.xml
           [...]
            1044 Deflate 2021-07-29 01-56 rights.txt
            3430 Deflate 2021-07-29 01-56 metadata.xml
           36912 Deflate 2021-07-29 01-56 meta.xml
z.files[0]
ZipFile.ReadableFile(name=occurrence.txt, method=Deflate, uncompresssedsize=39310607, compressedsize=4638199, mtime=1.627487774e9)

I can find no examples or documentation on how to do this.

Two questions:

  1. Is ZipFile the right library to use? There seems to be zero documentation.
  2. how do I access the contained files and load them into a DataFrame. I tried IOBuffer but couldn’t find any way open the ZipFile.ReadableFile?

Thanks…

I found an example using sourcegraph.com. My working example follows. I’d imagined an example in the imagined documentation.
It seems like a good API.

uri = "https://api.gbif.org/v1/occurrence/<>.zip"
f = download(uri)
z = ZipFile.Reader(f)
z_by_filename = Dict( f.name => f for f in z.files)
df = CSV.read(z_by_filename["occurrence.txt"], DataFrame)
names(df)
250-element Vector{String}:
 "gbifID"
 "abstract"
 "accessRights"
 "accrualMethod"
 "accrualPeriodicity"
 "accrualPolicy"
 "alternative"
 "audience"
 "available"
 "bibliographicCitation"
 "conformsTo"
 "contributor"
 "coverage"
 ⋮
3 Likes

In general this tutorial https://github.com/bkamins/Julia-DataFrames-Tutorial/blob/master/04_loadsave.ipynb tries to cover most of the standard cases of reading/writing data for DataFrames.jl.

4 Likes

Thanks, I am working my way through your tutorials, and enjoying them very much!

I was jumping ahead, as I am moving my software development from Python to Julia, and had some existing Python DataFrame scenarios I was keen to try.

This is week 2 of Julia for me, so I expect to be lost quite a bit. One thing that have to learn is how to navigate the libraries.

Could I have seen that the sub-files within the ZipFile would plug into CSV.Read() if was competent with the type system?