Writing to CSV files and plotting are getting slower in every run

Hi,

I am using an HPC cluster to solve an optimization model. Usually, solving an optimization model takes hours, and writing results to CSV files takes only minutes. However, it is reverse in my case. Let me show time and memory performances in two subsequent runs. In the first run, summary table is

Screen Shot 2021-09-24 at 9.54.53 AM

Then, I got the results from HPC and deleted all files. I afterwards solved the same model. Here is the summary table:

Screen Shot 2021-09-24 at 9.57.24 AM

I will solve this model again and again, but it seems the time spent for reporting results is skyrocketing. So, I do not want to submit another job to HPC before I resolve this issue.

I am closing all figures created with Julia and Python, with commands of close(fig) and plt.close(fig), respectively. I am not sure if I am doing anything wrong. I need your insight.

Thank you in advance for all your help,
Kind regards

I’m not sure how helpful I can be in this, but I think we need more information to diagnose your problem. Do you know if the slow-down is coming from CSV or your plotting more? What plotting library are you using? Just to make sure, you aren’t re-compiling CSV every time you run something, right? i.e. everything is being run in the same Julia session?

  1. “Do you know if the slow-down is coming from CSV or your plotting more?”
    I do not know. I also do not know how to obtain this insight.

  2. “What plotting library are you using?”
    I am using PyPlot and PyCall to create my figures. I am calling Python (matplotlib package) to create stack plots.

  3. I am not sure if I understood your question right, but when I run the same code every time, I call all packages I need with the expression using. Just to give your more information and clarification, here is the code I run every time:

using JuMP, PyPlot, DataFrames, CSV, Gurobi, Missings, PyCall, Statistics, TimerOutputs # Importing required packages

const to = TimerOutput()
@timeit to "define_figcreation" include("CreateBarLine.jl")
@timeit to "define_optimization" include("ExpansionModel.jl")
@timeit to "define_fixmodel" include("FixModel.jl")
@timeit to "define_stackcreation" include("ProcessDispatch.jl")
@timeit to "define_inputs" include("ReadFiles.jl")
@timeit to "define_reporting" include("RecordCSV.jl")
@timeit to "define_plotting" include("RecordPlot.jl")
@timeit to "define_sets" include("SetCreation.jl")

@timeit to "build_model_solve_model_report_results" ExpansionModel()

show(to)

ExpansionModel() looks like this:

function ExpansionModel()

    @timeit to "build_model" begin
              My model here...
    end

    @timeit to "solve_model" begin optimize!(Expansion_Model) end

    @timeit to "report_results" begin
            RecordCSV(some inputs here...)
            RecordPlot(some inputs here...)
    end
end

I hope my answers made a bit clarification.

This is not the most helpful information.

Which of the lines above is causing the slowdown? Can you show us an excerpt of the code where you write to CSV?

In the second code snippet above, RecordCSV() and RecordPlot() is too slow.

Let me show you a small part of these:

function RecordCSV(some inputs here...)
    for ...
        if ...
            data = DataFrame()
            CSV.write(path_to_folder)
        end
    end
end
function RecordPlot(some inputs here...)
    for ...
       CreateBarLine()
    end
    
     for ...
        for ...
            CSV.write(path_to_folder)
            scriptdir = @__DIR__
            pushfirst!(PyVector(pyimport("sys")."path"), scriptdir)
            pyimportjulia = pyimport("CreateDisStack")
            pyimportjulia.CreateDisStack(dispatchpath, filename)
        end
    end
end
function CreateBarLine(some inputs here...)
   fig = figure(figsize = (figsize1, figsize2))
   bar()
   fig.savefig(path_to_folder)
   close(fig)
end
def CreateDisStack(rpathname, fname):

    import matplotlib.pyplot as plt
    import csv

    with open(rpathname + fname + '.csv', newline = '') as csvfile:
        data = list(csv.reader(csvfile, delimiter = '\t'))

    fig, ax = plt.subplots(figsize = (figsize1, figsize2))
    ax.stackplot()
    ax.legend()
    ax.set_title()
    ax.set_xlabel()
    ax.set_ylabel()

    fig.savefig(path_to_folder)
    plt.close(fig)

These are all information I have on my hand available. Thank you for your answers.

If possible, I would encourage you to try and profile the time with just the CSV writing and the time with both the CSV writing and the plotting. That will help narrow down the issue.

1 Like

Okay, I will do that and share the results here.

1 Like

Could this issue be related to
https://github.com/JuliaData/CSV.jl/issues/850
?

1 Like

Hi Fredrik @baggepinnen , Fikri @fksayaci

I really appreciate your excellent Julia code packages,
so Hopefully to save cycles for you :slight_smile: , I will speculatively
hazard a low cost SWAG (Scientific/Speculative Wild A*s Guess) to gather Debug Info
to narrow down the root cause of the issue.

So to get more help first you probably need a few more debug steps to narrow down the root cause, so maybe try out @pdeffebac suggestion above ?
Namely " … try and profile the time with just the CSV writing,
and the time with both the CSV writing and the Plotting."

But speculating with what’s given - namely that :
)) reporting_results has both A) write CSV and B) using Plots steps
)) Allocations stays the same @@ 22.6 GiB (GB ? Question know if that is CPU RAM ONLY ?)
)) Per reporting_results
Run#1 including “B) using Plots” taking 3854 seconds, then
Run#2 including “B) using Plots” taking 17882s ;

In order To narrow down root cause consider trying out this advice for
GPU Detecting-memory-leaks / and Track GPU GC (garbage collection)
@@ https://juliagpu.gitlab.io/CUDA.jl/usage/memory/#Detecting-leaks
NOTE … To keep track … feature is only available when running Julia on debug level 2 or higher (i.e., with the -g2 argument).
When you do so, the memory_status() function from above will display additional information:
.
@@ Cuda kernel error - #2 by maleadt
To see the actual error, and stack trace, you need to follow the suggestion:
start Julia on debug level 2, e.g., julia -g2 my_script.jl or just julia -g2 to get a REPL. The reason is that these messages and traces are embedded in the generated code (we don’t have stack unwinding), and thus have a fairly large cost in performance.

For example :
… julia-1.4.2/bin/julia -g2
julia> using CUDA
[ Info: Precompiling CUDA [052768ef-…-66c8b64840ba]

julia> CUDA.memory_status()
Downloading artifact: CUDA90
Downloading artifact: CUDNN_CUDA90

Effective GPU memory usage: 98.50% (3.882 GiB/3.941 GiB) <<

Naturally this is just a speculative SWAG to gather Debug Info
if multiple runs of
“B) using Plots” is the issue,
so of course
“A) write CSV” could still be the issue.

Let us know your test results so far ?

HTH,
Marc

Also check whether there is something happening with the HPC storage. The operation is mostly IO bound, you said. So, it could be that the system is running out of fast storage and needs to move to slower disks? Maybe, the HPC throttles writes (would be weird but who knows)?

If it is Disk I/O bound you can monitor that on Linux with this command.

$ iotop -h
Controls: left and right arrows to change the sorting column,

$ sudo iotop -a -d20

Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 2.00 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
26448 be/4 0.00 B 12.89 M 0.00 % 0.00 % firefox [~M Worker]

Sorry for my late reply, but it took a bit time to get time profile of the code.

First of all, I inserted @timeit for RecordCSV() to get time profile of CSV writing. Since we already have time profile of all "report_results", we simply figure out how much time spent for RecordPlot():

function ExpansionModel()

    @timeit to "build_model" begin
              My model here...
    end

    @timeit to "solve_model" begin optimize!(Expansion_Model) end

    @timeit to "report_results" begin
            @timeit to "write_to_CSV" begin RecordCSV(some inputs here...) end
            RecordPlot(some inputs here...)
    end
end

And, I got the following results:

Screen Shot 2021-09-28 at 9.53.32 AM

So, plotting takes huge amount of time. Then, I was curious if stack plot or bar/line plot was a problem because I am calling Python to create stack plots. To get this, I update the time profile as follows:

function RecordPlot(some inputs here...)
     @timeit to "create_Bar_and_Line_plots" begin
         for ...
            CreateBarLine()
         end
     end
    
    @timeit to "create_Stack_plots" begin
         for ...
            for ...
                CSV.write(path_to_folder)
                scriptdir = @__DIR__
                pushfirst!(PyVector(pyimport("sys")."path"), scriptdir)
                pyimportjulia = pyimport("CreateDisStack")
                pyimportjulia.CreateDisStack(dispatchpath, filename)
            end
        end
    end
end

And, I got the following results:

Screen Shot 2021-09-28 at 10.01.53 AM

However, I also realized that this is not always the case. For example, when I solved this model in one trial, I got the following time profile:

It still does not make sense to me that plotting and writing to CSV files takes this amount of time, but I was surprised to see big difference from the previous time profiles. Note that results are exactly the same. I am not changing the model at all. So, every time I solve the model, I get the same optimal solutions, CSV files, and plots. Note also that total size of these figures and CSV files is at most 9 MB.

I am in contact with HPC as well. We are trying to figure out whether this is an issue related to Julia/Python or HPC.

I also wanted to share a warning with you. I have no idea if this is related to slow processing of outputs, but I just wanted to share:

Sorry, I bombarded lost of feedbacks. Hope that these are useful for you to get some insights.

Kind regards

2 Likes

For that last screenshot, if you solve the model in one trial, doe that mean you are observing CSV compile times rather than run time?

I’m afraid I can’t help you with the pycall stuff. I don’t know enough about the overhead of python call times or the IO related issue you have reading and writing such small files.

I am not an expert in programming, but I think compile time for CSV writing is defined with “define_reporting” in the code below:

using JuMP, PyPlot, DataFrames, CSV, Gurobi, Missings, PyCall, Statistics, TimerOutputs # Importing required packages

const to = TimerOutput()
@timeit to "define_figcreation" include("CreateBarLine.jl")
@timeit to "define_optimization" include("ExpansionModel.jl")
@timeit to "define_fixmodel" include("FixModel.jl")
@timeit to "define_stackcreation" include("ProcessDispatch.jl")
@timeit to "define_inputs" include("ReadFiles.jl")
@timeit to "define_reporting" include("RecordCSV.jl")
@timeit to "define_plotting" include("RecordPlot.jl")
@timeit to "define_sets" include("SetCreation.jl")

@timeit to "build_model_solve_model_report_results" ExpansionModel()

show(to)

Run time for CSV writing is defined with “write_to_CSV” in the code below:

function ExpansionModel()

    @timeit to "build_model" begin
              My model here...
    end

    @timeit to "solve_model" begin optimize!(Expansion_Model) end

    @timeit to "report_results" begin
            @timeit to "write_to_CSV" begin RecordCSV(some inputs here...) end
            RecordPlot(some inputs here...)
    end
end

So, in the screen shots I shared, “write_to_CSV” is run time, not compile time.

No, in Julia, compilation happens not when the function is defined, but rather when it is run the first time. To debug you could run RecordCSV twice, but only record the 2nd one

    RecordCSV(some inputs here...) # to compile everything
    @timeit to "report_results" begin
            @timeit to "write_to_CSV" begin RecordCSV(some inputs here...) end
            RecordPlot(some inputs here...)
    end
1 Like

Yes, you’ll need to confirm this is the issue,
but it seems that your Warning message points to this SOLN >>
MKL+PyPlot

Error: .. Warning: both Julia and NumPy are linked with MKL, which may cause conflicts and crashes (#433).
└ @ PyCall .. .julia\packages\PyCall\ .. \src\numpy.jl

The kernel appears to have died. It will restart automatically.

And the real underlying problem seems to possibly be that essentially julia recompiles mkl from source per this >>

Is it possible to link julia to mkl without recompiling from source?

NOTE Julia uses OpenBLAS, and MATLAB (& Python ?) uses MKL.
The easiest way to use Julia is probably here >>

… So IF you confirm you actually ARE recompiling MKL over and over (Ugh :grimacing: ) ;
At a Birds Eye View it Appears your bottom line options may be :
– add OpenBLAS and then using OpenBLAS , and confirm reconfiguration has stopped Julia calling MKL to stop recompiling MKL etc.
– add Plots and then using Plots , Ditch PyPlot PyCall to stop recompiling MKL

Another possibility for your consideration :slight_smile:

I am using PyPlot backend, but I do not believe as intensely as you are,
and this worked out OK to list the Options about managing a Python installation

pkg> add PyPlot

==THEN I chose==>

 pkg> add Conda ## whereupon Julia configured PyCall to use a Julia-specific Python 
and so Julia Pkg Handled everything beautifully

Here is why I selected “pkg> Add Conda” :
Another alternative is to configure PyCall to **use a Julia-specific Python **
**distribution via the Conda.jl package (which installs a private Anaconda **
Python distribution), which has the advantage that packages can be installed
and kept up-to-date via Julia. As explained in the PyCall documentation,
set ENV[“PYTHON”]=“” , ## ## Run ONCE THEN Comment out – ENV[“PYTHON”]=“”
run Pkg.build(“PyCall”)

Below are just some notes I scratched down , HTH

(v1.4-EXP3) pkg> add PyCall

  • PyCall v1.92.2
    build PyCall ## again
    julia> using PyCall
    [ Info: Precompiling PyCall

(v1.4-EXP3) pkg> build PyCall
Building Conda ─→ ~/.julia/packages/Conda/tJJuN/deps/build.log
Building PyCall → ~/.julia/packages/PyCall/tqyST/deps/build.log

.
and re-launch Julia. Then,
To install the matplotlib module, you can use
pyimport_conda("matplotlib", PKG) ,
where PKG is the Anaconda package that contains the module matplotlib,
.
julia> pyimport_conda("matplotlib", "PKG")
[ Info: Installing matplotlib via the Conda PKG package…

Hello all,

I talked to HPC management in my school. It seems that I am supposed to run the models by using lustre file system. Otherwise, I/O operations would be very very slow, as it happened to me.

I am now using lustre system, and the time profile is as follows:

Screen Shot 2021-10-04 at 11.02.14 AM

@Marc.Cox, thank you for your suggestions regarding the warning message. I will look at it.

Thank you so much all for your help,
Kind regards

2 Likes