Unzipping single file with a shell command

I’m trying to update my package, ClimateDataIO, for Julia 0.7 and beyond but because of a problem with some function down the chain in ZipFile/CSV/etc. I’ve had to switch to a shell command to unzip the file that I want. The following command works perfectly in the command line bug none of the many forms I’ve tried work in the REPL.

unzip -p /home/user/.julia/packages/ClimateDataIO/XleN/test/2016-12-11T203000_AIU-1359.ghg 2016-12-11T203000_AIU-1359.data >/tmp/2016-12-11T203000_AIU-1359.data

My last attempt looked like this:

julia> read(`unzip -p '/home/user/.julia/packages/ClimateDataIO/XleN/test/2016-12-11T203000_AIU-1359.ghg 2016-12-11T203000_AIU-1359.data >/tmp/2016-12-11T203000_AIU-1359.data'`,String)

ERROR: failed process: Process(`unzip -p '/home/user/.julia/packages/ClimateDataIO/XleN/test/2016-12-11T203000_AIU-1359.ghg 2016-12-11T203000_AIU-1359.data >/tmp/2016-12-11T203000_AIU-1359.data'`, ProcessExited(9)) [9]
 [1] error(::String, ::Base.Process, ::String, ::Int64, ::String) at ./error.jl:42
 [2] pipeline_error at ./process.jl:712 [inlined]
 [3] read(::Cmd) at ./process.jl:648
 [4] read(::Cmd, ::Type{String}) at ./process.jl:652
 [5] top-level scope at none:0

run gives the same result.

julia> run(`unzip -p '/home/user/.julia/packages/ClimateDataIO/XleN/test/2016-12-11T203000_AIU-1359.ghg 2016-12-11T203000_AIU-1359.data >/tmp/2016-12-11T203000_AIU-1359.data'`)
ERROR: failed process: Process(`unzip -p '/home/user/.julia/packages/ClimateDataIO/XleN/test/2016-12-11T203000_AIU-1359.ghg 2016-12-11T203000_AIU-1359.data >/tmp/2016-12-11T203000_AIU-1359.data'`, ProcessExited(9)) [9]
 [1] error(::String, ::Base.Process, ::String, ::Int64, ::String) at ./error.jl:42
 [2] pipeline_error at ./process.jl:712 [inlined]
 [3] #run#509(::Bool, ::Function, ::Cmd) at ./process.jl:670
 [4] run(::Cmd) at ./process.jl:668
 [5] top-level scope at none:0

What am I doing wrong?

You are probably looking for pipeline. From the docs (?pipeline in the REPL):

  pipeline(from, to, ...)

  Create a pipeline from a data source to a destination. The source and
  destination can be commands, I/O streams, strings, or results of other
  pipeline calls. At least one argument must be a command. Strings
  refer to filenames. When called with more than two arguments, they
  are chained together from left to right. For example, pipeline(a,b,c)
  is equivalent to pipeline(pipeline(a,b),c). This provides a more
  concise way to specify multi-stage pipelines.


  run(pipeline(`ls`, `grep xyz`))
  run(pipeline(`ls`, "out.txt"))
  run(pipeline("out.txt", `grep xyz`))

I tried that as well but without any success. It complains about the file not existing even though doing it in the shell works. In this case it complains about “No such file or directory”.

julia> run(pipeline("/home/user/.julia/packages/ClimateDataIO/XleN/test/2016-12-11T203000_AIU-1359.ghg 2016-12-11T203000_AIU-1359.data >/tmp/2016-12-11T203000_AIU-1359.data",`unzip -p`))
ERROR: IOError: open: no such file or directory (ENOENT)
 [1] uv_error at ./libuv.jl:85 [inlined]
 [2] open(::String, ::UInt8, ::UInt16) at ./filesystem.jl:81
 [3] setup_stdio(::Base.FileRedirect, ::Bool) at ./process.jl:472
 [4] setup_stdio(::getfield(Base, Symbol("##499#500")){Cmd}, ::Tuple{Base.FileRedirect,RawFD,RawFD}) at ./process.jl:487
 [5] #_spawn#498(::Nothing, ::Function, ::Cmd, ::Tuple{Base.FileRedirect,RawFD,RawFD}) at ./process.jl:511
 [6] (::getfield(Base, Symbol("#kw##_spawn")))(::NamedTuple{(:chain,),Tuple{Nothing}}, ::typeof(Base._spawn), ::Cmd, ::Tuple{Base.FileRedirect,RawFD,RawFD}) at ./none:0
 [7] #_spawn#495(::Nothing, ::Function, ::Base.CmdRedirect, ::Tuple{RawFD,RawFD,RawFD}) at ./process.jl:401
 [8] _spawn at ./process.jl:401 [inlined]
 [9] #run#509(::Bool, ::Function, ::Base.CmdRedirect) at ./process.jl:669
 [10] run(::Base.CmdRedirect) at ./process.jl:668
 [11] top-level scope at none:0

Even if I create the file, touch("/tmp/2016-12-11T203000_AIU-1359.data"), it gives the same error when I execute the run command.

Did you read the docs I linked? They say:

run(pipeline(`unzip -p /path/to/file`, "path/to/outputfile"))

Can you tell us more about your situation? I cannot reproduce, i.e. read(`unzip -p '/path/to/file/foo.zip'`, String) works perfectly for me.

On the other hand, read(`unzip -p '/path/to/file/foo.zip /path/to/second/file/foo2.zip'`, String) gives the same ERROR: failed process: Process(`unzip -p '/path/to/file/foo.zip /path/to/second/file/foo2.zip'`, ProcessExited(9)) [9]

So I’d guess that unzip -p does not like multiple files as input (and your command hat multiple input files if I read correctly). Can’t you just read one after the other?

Edit: ok, my suggestions where not helpful. preserved below.
I’d use e.g.

julia> f=open(`cat ./foo.c`, "r");

julia> s=read(f.out, String)
"#include <immintrin.h>\n..."
julia> close(f)

If you open a command with "r+", then you can also write to f.in and close it, in order to interact with your command (relevant if you want to pass things into your command’s stdin). Be warned: writes to pipes are by default unbuffered. This is abysmally slow for some workloads (every single write gives a bunch of syscalls; check strace to see whether you need buffering tricks).

Yup, long before my first post along with numerous iterations and combinations of run, read, and pipeline. The last iteration did work so thanks for getting me to try it yet again. I really could have sworn that I had tried this combination.

:man_shrugging: C’est la vie.

This is the working command.

run(pipeline(`unzip -p /home/user/.julia/packages/ClimateDataIO/XleN/test/2016-12-11T203000_AIU-1359.ghg 2016-12-11T203000_AIU-1359.data`,"/tmp/2016-12-11T203000_AIU-1359.data"))

DataDeps.jl includes unpack as one
of its helpers
see the code in

should handle all formats and all operating systems

Which was inturn taken and adapted from some code in BinDeps.jl

1 Like