Parallel bzip2 ang gzip: pbzip2 and pigz

Stuart_Rogers · September 11, 2022, 7:53pm

Do JuMP’s write_to_file and read_from_file use pbzip2 and pigz, parallel implementations of bzip2 and gzip, when writing or reading models stored in .bz2 and .gz compressed files?

https://jump.dev/JuMP.jl/stable/reference/models/#JuMP.write_to_file

https://zlib.net/pigz/

miles.lubin · September 11, 2022, 8:35pm

No, JuMP uses GitHub - JuliaIO/CodecZlib.jl: zlib codecs for TranscodingStreams.jl. and GitHub - JuliaIO/CodecBzip2.jl: A bzip2 codec for TranscodingStreams.jl., which in turn use the standard zlib and libbzip2.

Stuart_Rogers · September 11, 2022, 9:01pm

pbzip2 and pigz ought to be much faster than bzip2 and gzip since they utilize multiple cores through multithreading. Are there plans for TranscodingStreams.jl to use pbzip2 and pigz instead?

odow · September 11, 2022, 9:12pm

No plans. Is the speed of writing our your file a bottleneck?

GunnarFarneback · September 12, 2022, 8:44am

For what it’s worth, gzip decompression is not very amenable to parallellization. (Compression is a different story.)

Stuart_Rogers · September 12, 2022, 6:54pm

My workflow is:

Construct the LP model in JuMP: 37 minutes.
Write the LP model to a MPS file using JuMP.write_to_file: 100 minutes.
Using 24 threads, presolve with PaPILO, solve with PDLP (1e-4 relative tolerance), postsolve with PaPILO: 145 minutes (PDLP takes 140 minutes to solve the presolved model).

Writing the MPS file takes 35% of the overall time. Writing a compressed MPS.GZ file takes about the same amount of time, but the compressed MPS.GZ file (1.2 GB) is 11% the size of the uncompressed MPS file (11 GB).

For another similarly-sized LP model, step 3 takes 50 minutes, so that writing the MPS file takes 53% of the overall time.

GunnarFarneback · September 12, 2022, 8:14pm

Have you profiled where the time is spent in step 2?

The only way to spend 100 minutes writing 11 GB (that’s 1.8 MB per second) is to have a very slow network disk but in that case the compressed writing should only require 11 minutes and spending 89 minutes compressing 11 GB of data sounds like entirely the wrong order of magnitude, even if it’s single-threaded.

My gut feeling is that writing or compressing+writing takes up a few minutes and the rest is spent on something else but I have no insights in the code so I can’t even guess what that might be. Profiling is the only way to find out where the time truly is spent.

An easy experiment is to write the uncompressed file to disk and then compress it with command line gzip. How much time does the latter step require? It should be in the same ballpark as writing the compressed file from Julia.

Stuart_Rogers · September 12, 2022, 8:39pm

I am using a fairly new Lambda workstation. This is how I measure the time of write_to_file:
MPS_fn = “/data/my.mps” # MPS filename. or MPS GZ filename: MPS_fn = “/data/my.mps.gz”
MPS_time = @elapsed begin
write_to_file(m,MPS_fn)
end

odow · September 12, 2022, 8:40pm

Construct the LP model in JuMP: 37 minutes.

Write the LP model to a MPS file using JuMP.write_to_file: 100 minutes.

I think we’ve had this conversation a couple of times, but JuMP might to be the best tool for the job. We don’t optimize for writing to a file. Part of the “write” is actually a “copy the entire model in memory at least once” which is probably part of the issue. I’ll have a think to see if there’s a way we could improve things.

odow · September 12, 2022, 8:42pm

Yeah, that’s because the “write” isn’t timing only the write to file. It also has a bunch of overhead on the JuMP side to turn the problem into something that can be written to an MPS file (which involves a copy of the entire model), to make sure every variable and constraint has a unique name, to order the columns, etc. The issue isn’t the compression.

Stuart_Rogers · September 20, 2022, 3:07am

Where is the source code for JuMP.write_to_file?

Why doesn’t JuMP.write_to_file support xz compression, which is included in TranscodingStreams.jl?
https://jump.dev/JuMP.jl/stable/reference/models/#I/O

odow · September 20, 2022, 3:35am

JuMP.write_to_file is a thin wrapper:

github.com

jump-dev/JuMP.jl/blob/master/src/file_formats.jl

#  Copyright 2017, Iain Dunning, Joey Huchette, Miles Lubin, and contributors
#  This Source Code Form is subject to the terms of the Mozilla Public
#  License, v. 2.0. If a copy of the MPL was not distributed with this
#  file, You can obtain one at https://mozilla.org/MPL/2.0/.

function _throw_write_to_file_explanatory_message(
    ::MOI.UnsupportedConstraint{F,S},
) where {F,S}
    return error(
        "Unable to write problem to file because the chosen file format " *
        "doesn't support constraints of the type $F-in-$S.",
    )
end

_throw_write_to_file_explanatory_message(err) = rethrow(err)

function _copy_to_bridged_model(f::Function, model::Model)
    inner = MOI.instantiate(f; with_bridge_type = Float64)
    try
        MOI.copy_to(inner, model)

This file has been truncated. show original

around MOI.write_to_file:

github.com

jump-dev/MathOptInterface.jl/blob/be1c8c3b30cc3a27aa50c285b9b38940716d47ca/src/FileFormats/FileFormats.jl#L113-L117


      
          function MOI.write_to_file(model::MATH_OPT_FORMATS, filename::String)
              compressed_open(filename, "w", AutomaticCompression()) do io
                  return write(io, model)
              end
          end

other extensions are possible, but need implementing:

github.com

jump-dev/MathOptInterface.jl/blob/be1c8c3b30cc3a27aa50c285b9b38940716d47ca/src/FileFormats/utils.jl#L190-L252


      
          """
              abstract type AbstractCompressionScheme end
          
          
Base type to implement a new compression scheme for MathOptFormat.
          
          
To do so, create a concrete subtype (e.g., named after the compression scheme)
          and implement:
          
          
    extension(::Val{:your_scheme_extension}) = YourScheme()
              compressed_open(f::Function, filename::String, mode::String, ::YourScheme)
          """
          abstract type AbstractCompressionScheme end
          
          
struct NoCompression <: AbstractCompressionScheme end
          
          
extension(::Val) = NoCompression()
          
          
function compressed_open(
              f::Function,
              filename::String,

This file has been truncated. show original

PRs to improve things are welcome.

For your case, the better long-term outcome is probably to write a C interface to PDLP. Then you could go straight to the C library without having to read and write files.

Topic		Replies	Views
Speed comparison on reading a gzip file General Usage performance	8	1473	December 10, 2023
JuMP creating a huge MPS file Optimization (Mathematical) jump , optimization	22	1017	May 26, 2022
JuMP writeMPS does not finish writing LP model to file Optimization (Mathematical)	20	568	May 25, 2022
Slow gzip streaming in julia but not in python General Usage performance	20	1958	March 18, 2021
JuMP.write_to_file small change to model Optimization (Mathematical) jump	2	283	June 30, 2022

Parallel bzip2 ang gzip: pbzip2 and pigz

Related topics