Savefig() in Plots.jl is slow when I scatter 13056 points

Hi everyone,

I am using Plots with pyplot() backend. I have a set of ~200 points to scatter, each of them has a unique marker size and color. I also need to plot several lines. After all steps with plot!() and scatter!(), I got a fairly complicated plot. But when I want to display or save the plot, it takes me about 10 min ! All I am doing is not far from the examples given in the documentation.

What is wrong with my code ? Below is an example for the plot which took 10min to save.

This seems impossible to tell without a minimum working example - what commands are you actually executing to create your plot?

2 Likes

Since you’re using PyPlot, have you tried creating a similar plot in python? How long does saving take in this case?

Also note that saving as a PNG might be much faster than saving as a PDF.

3 Likes

Also, what version of Julia and plots are you using?

My experience that many scatter points leads to a huge file when you produce a pdf or svg file. Saving as png reduces the file size a lot.


function plot_bands!(
    BS::BandStructure;
    outfile = "tmp.png",
    COLOR = recom_color,
    SIZE=2.0,
    scale_kpath=1.0,
    settings=Dict(:colors=>["black",],
                  :lw=>2.0,
                  :range=>nothing,
                  :markerstrokealpha=>0.8,
                  :markerstrokewidth=>1.0,
                  :cycle=>true)
    )
    
    plt = plot()

    color_list = get(settings, :colors, "black" )
    line_width = get(settings, :lw    , 1.5 )
    ωrange     = get(settings, :range , nothing)
    #α          = get(settings, :markeralpha , 1.0)
    MSα        = get(settings, :markerstrokealpha , 0.8)
    MSw        = get(settings, :markerstrokewidth , 1.0)
    cycle      = get(settings, :cycle , true)
    Ncolor     = length(color_list)
    Nbands     = num_bands(BS)
    Nop        = num_markers(BS)
    (ωmax,ωmin)= (get_band_max(BS), get_band_min(BS))
    bands_k    = extract_kpath_to_line(BS,scale=scale_kpath)
    KX         = vcat((BS.Bands .|> b->get(bands_k,first(b),1.0))...)
    YLIM       = (ωrange==nothing ? ((ωmin-0.05*abs(ωmin)), (ωmax+0.05*abs(ωmax))) : ωrange)
    if cycle
        KX = vcat(KX,[KX[end]+abs(KX[end]-KX[end-1]),])
    end
    basic_settings = (ylims=YLIM, lw=0.5, grid=false, legend=nothing)

    # ------------ FRAMES -------------
    println("plot_bands!() : plotting frames ...")
    frame_settings = (color="gray", basic_settings...)
    # horizontal line at en = 0
    plt = plot!(KX, 0.0 .* KX; frame_settings...)
    # vertical line at high-symmetry point (beginning of each kpath)
    vbar = ωmin .+ ((ωmax-ωmin)/20).*collect(0:21)
    for band ∈ BS.Bands
        plt = plot!(bands_k[first(band)][1] ⨰ 22, vbar; frame_settings...)
    end
    # right-most vertical border
    plt = plot!(KX[end] ⨰ 22, vbar; frame_settings...)

    # ------------ BANDS  -------------
    println("plot_bands!() : plotting markers ...")
    plt = scatter!( ;
                    markershape = :circle,
                    markercolor = :transparent,
                    legend      = nothing )
    color_func(x,c) = (real(x)>0.001) ? c : "white"
    size_func(x,s)  = (real(x)>0.001) ? s*real(x) : 0.0
    for n ∈ 1:Nbands
        BAND = vcat([map(x->x[2][n],last(band)) for band ∈ BS.Bands]...)
        if cycle
            BAND = vcat(BAND,[BAND[1],])
        end
        for iop in 1:Nop
            markercolor = vcat([ map(x-> color_func(x[2][n,iop], COLOR[iop]), vtkpm)
                                    for (s,vtkpm) ∈ BS.Markers ]...)
            markersize  = vcat([ map(x->  size_func(x[2][n,iop], ((SIZE isa Vector) ? SIZE[iop] : SIZE)), vtkpm)
                                    for (s,vtkpm) ∈ BS.Markers ]...)
            plt = scatter!( KX, BAND,
                            markershape = :circle,
                            markersize  = markersize, ##!!! SIZE MAY BE CLOSE TO ZERO !!!
                            markercolor = :transparent,
                            markerstrokealpha = MSα,
                            markerstrokecolor = markercolor,
                            markerstrokewidth = MSw,
                            legend      = nothing )
        end
    end

    println("plot_bands!() : plotting lines ...")
    BANDS = [vcat([map(x->x[2][n],band) for (s,band) ∈ BS.Bands]...) for n ∈ 1:Nbands]
    for n ∈ 1:Nbands
        BD = BANDS[n]
        if cycle
            BD = vcat(BD,[BD[1],])
        end
        c = Nop==0 ? "black" : color_list[(n-1)%Ncolor+1]
        plt = plot!(KX, BD; color=c, basic_settings...)
    end
    # --------------------------------------

    println("plot_bands!() : saving ...")
    savefig(plt, outfile)
    return outfile
end

I’ve tested that the only slow command is savefig(plt, outfile) at the end. Other plotting commands are fast.

I saved the plot as png. I don’t have time to write the code in python; the plotting code is a bit complicated…

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_PKG_SERVER = https://mirrors.tuna.tsinghua.edu.cn/julia

(@v1.5) pkg> st Plots
Status `~/.julia/environments/v1.5/Project.toml`
  [91a5bcdd] Plots v1.9.1

(@v1.5) pkg> st PyPlot
Status `~/.julia/environments/v1.5/Project.toml`
  [d330b81b] PyPlot v2.9.0

(@v1.5) pkg> 

I don’t have PyPlot installed, so the below is with GR backend (not sure how backend-dependent saving a plot is):

julia> using Plots

julia> p = scatter(rand(100_000), rand(100_000));

julia> @time savefig(p, "out.png")
  2.591885 seconds (1.75 M allocations: 71.681 MiB, 0.18% gc time)

julia> @time savefig(p, "out.svg")
  0.753379 seconds (1.75 M allocations: 88.020 MiB, 2.66% gc time)

julia> p = scatter(rand(1_000_000), rand(1_000_000));

julia> @time savefig(p, "out.png")
 25.284884 seconds (17.05 M allocations: 703.395 MiB, 0.38% gc time)

julia> @time savefig(p, "out.svg")
  8.101072 seconds (17.05 M allocations: 869.746 MiB, 1.19% gc time)

So here it seems that time to save scales linearly with the number of points on the scatter plot, at ~2.5 seconds per 100,000 points for a png file and around 1/3 of that for writing an svg file (these timings are from a 10 year old laptop).

2 Likes

I’m guessing you’re plotting each point separately.
You want to put them in vectors with x and y coordinates (separately) and scatter the whole vector at once.

6 Likes

Hi, I suggest to change the title of this conversation to “savefig is slow”.

1 Like

This is to be expected somewhat, as savefig() to a PNG file does something different than when saving to an SVG file. Saving to PNG means actually plotting the data as it needs to be turned into an image, whereas the saving to SVG most likely (haven’t checked) just saves each data point as an entry in the file. It’s only when loading and viewing the SVG file that you’re actually plotting it into an image.

Yes, which is also why the size of the svg scales with the number of points - I think the 1m point one above turned out to be ~150MB…

OMG!!!
I got 191218 lines in the SVG file for the figure I wanted to plot …

Yes, you don’t really “plot” to an SVG file, you merely write away the data in a format that is easy to plot :wink:

Problem solved : I trimmed off the points with small size (size means weight, which can be some property of the phonon band). Now only 36.627% of 13056 markers are scattered. Then the savefig() function is faster. The SVG file has 84305 lines now.

Thanks to everybody !

This still sounds like an inefficient data representation. If you run into storage or performance problems later on, then saving as PNG or JPG will probably solve those.

Apart from that it would be a bit of a shame if you end writing your plots to SVG because it is faster. You would still need to plot to a real image at some point, which will take time on every viewing of the SVG file. Bitmap files like PNG/JPG/TIF can be loaded almost instantly instead.

Than you paumelis. I will try other solutions when I have time.

After all the hints I got from here, I realized that my plot is a bit complicated. I’ll try to simplify that first…