Saving vegalite plots takes very long time compared to plots.jl

Hello everyone,
first time poster here.
I’m running into problems when trying to save vegalite plots as pdf, or anything other than html for that matter. The following script exemplifies this quite well:

using VegaLite
using Plots

x = 1:40000
y = sin.(x/1000)

v = @vlplot(:line, x = x, y = y)

println("png vegalite")
@time save("test_vegalite.png", v) #    3.421519 seconds (3.21 M allocations: 85.575 MiB, 0.35% gc time)

println("pdf vegalite")
@time save("test_vegalite.pdf", v) #  3.412176 seconds (3.21 M allocations: 85.880 MiB, 0.12% gc time)

println("html vegalite")
@time save("test_vegalite.html", v) #  0.057578 seconds (320.46 k allocations: 24.777 MiB, 7.92% gc time)

u = plot(x, y,)

println("png plots")
@time savefig(u, "test_plots.png") #  0.049850 seconds (9.83 k allocations: 2.033 MiB)

println("pdf plots")
@time savefig(u, "test_plots.pdf") #  0.074213 seconds (9.83 k allocations: 2.205 MiB)e

println("html plots")
@time savefig(u, "test_plots.html") #  0.046458 seconds (14.27 k allocations: 2.860 MiB)

It seems that using the FileIO based save function to save vegalite plots to disk is between one and two orders of magnitudes slower than Plots’ savefig function.
As I am quite often running visualizations of grids (for instance for cellular automata) which by themselves are not the fastest to save to disk, or am running several dozens of plots in a row, these behavior in essence breaks vegalite for me.
I cannot remember it to behave like this when I used it back in March or so, so while I do not have definitive proof for it, my guess is that this is some new behavior.

I already opened an issue over in the vegalite repository about a month ago, but did not receive an answer yet, and because the issue is somewhat pressing, I’m trying my luck here.

Any hints on where to look for the culprit or what can be done about it are welcome. I really like vegalite as a package, if it were not for this one problem.

Thanks in advance

Update:

I decided to try one of the examples

using VegaLite, VegaDatasets, Profile
v = dataset("movies") |>
@vlplot(
    :rect,
    width=300, height=200,
    x={:IMDB_Rating, bin={maxbins=60}},
    y={:Rotten_Tomatoes_Rating, bin={maxbins=40}},
    color="count()",
    config={
        range={
            heatmap={
                scheme="greenblue"
            }
        },
        view={
            stroke="transparent"
        }
    }
)

@time v |> save("test2.pdf") #  38.962789 seconds (4.06 M allocations: 111.058 MiB, 0.06% gc time)

which gave consistent timing results with the former, far in excess of what one might expect I’d say…
So still wandering around aimlessly here. The timings have not been obtained from the first run where the compile time would show, just to have this out of the way.

1 Like

I admit it is not the fastest, but still I can’t reproduce your timings:

julia> @time v |> save("test2.pdf")
  3.949619 seconds (16.20 M allocations: 786.147 MiB, 4.79% gc time)
julia> versioninfo()
Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: AMD Ryzen 5 3600 6-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, znver1)

For another simpler example:

julia> v = dataset("stocks") |>
       @vlplot(
           :line,
           transform=[
               {filter="datum.symbol=='GOOG'"}
           ],
           x="date:t",
           y=:price
       )

julia> @time v |> save("test.pdf")
  0.241988 seconds (2.95 k allocations: 415.859 KiB)

For a simple line graph such as the one you showed, I’m also getting fairly reasonable timings:

v = dataset("stocks") |>
              @vlplot(
                  :line,
                  transform=[
                      {filter="datum.symbol=='GOOG'"}
                  ],
                  x="date:t",
                  y=:price
              )

 @time v |> save("test.pdf")
  1.008640 seconds (93.04 k allocations: 2.726 MiB)

for sake of completeness:

versioninfo()
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

I assume your first post refers to my example? If that is the case, I will try out the snippet on my windows installation, and report back. While it’s not obvious to me why this should make a difference, its perhaps worth a try.
Thanks for the quick replay.

Both my timings were from the VegaLite examples from the docs. The first the same example you have chosen (where you got the 38.962789 seconds).

Your own code with 40000 points for the x-axis didn’t work for me, so I skipped it.

ok, so I installed Julia on my windows partition, pulled in the necessary packages, et voila:

@time v |> save("test2.pdf")
  4.678941 seconds (4.06 M allocations: 110.934 MiB, 0.26% gc time)

which is in good agreement with your results. So I think I will give a reinstall on my main macos partition a go, and report back when done. Other then that, do you have deeper insight into where to look for differences between windows and macos? While I´m more leaning towards this being a peculiarity of my setup (should have come up at some earlier point I think…), I´m kind of in the dark here, and anything I do at this point is essentially guesswork…

Maybe it’s just a SSD on Windows and a older HD on the MAC?
(But the files aren’t so big, so… probably not)

No, I can’t say where the differences between Mac and Windows may be, probably not in the Julia source code. Perhaps in the used libraries? @davidanthoff could give some hints.

no, the machine has only one ssd, and both OS run on it. Also, I would expect everything to be so much slower then, not just vegalite… . Still lookin for a software issue.

So, in order to get a better handle on this, I installed the vega commandlinetools as explained here,
and used vegalite with the above example to create a vegalite specification file:

using VegaLite, VegaDatasets
v = dataset("movies") |>
@vlplot(
    :rect,
    width=300, height=200,
    x={:IMDB_Rating, bin={maxbins=60}},
    y={:Rotten_Tomatoes_Rating, bin={maxbins=40}},
    color="count()",
    config={
        range={
            heatmap={
                scheme="greenblue"
            }
        },
        view={
            stroke="transparent"
        }
    }
) |> 
save("figure.vegalite")

after that, I used the command line to create a pdf from the specification file using vl2pdf, and timed it:

time vl2pdf figure.vegalite test.pdf
vl2pdf figure.vegalite test.pdf  0.44s user 0.09s system 74% cpu 0.710 total

These number look more than acceptable to me, as far as pdf creation is concerned.

Saving the Vegalite specification is also not a problem:

@time v |> save("~/Desktop/figure.vegalite")
  0.074187 seconds (161.09 k allocations: 15.371 MiB)

So I presume that somewhere in between the creation of the vegalite spec and the actual pdf output strange things happen… :confounded:

Finally, when using @vgplot instead of @vlplot, which is part of the Vega.jl package (@vgplot is not officially documented in vegalite anymore it seems):

w = 
@vgplot(
    height=200,
    width=200,
    padding=5,    
    data=[:source=>dataset("cars")],    
    marks=[{
        name="marks",
        encode={
            update={
                shape={value="circle"},
                x={field="Horsepower", scale="x"},
                y={field="Miles_per_Gallon", scale="y"}
            }
        },
        from={data="source"},
        type="symbol"
    }],
    axes=[
        {
            domain=false,
            tickCount=5,
            grid=true,
            title="Horsepower",
            scale="x",
            orient="bottom"
        },
        {
            domain=false,
            grid=true,
            titlePadding=5,
            title="Miles_per_Gallon",
            scale="y",
            orient="left"
        }
    ],    
    scales=[
        {
            name="x",
            nice=true,
            zero=true,
            range="width",
            domain={data="source",field="Horsepower"},
            type="linear",
            round=true
        },
        {
            name="y",
            nice=true,
            zero=true,
            range="height",
            domain={data="source",field="Miles_per_Gallon"},
            type="linear",
            round=true
        }
    ]
)

@time w |> save("vegatest.pdf") #   2.318985 seconds (237.02 k allocations: 6.747 MiB)

I get much more reasonable results.
This happens on a completely fresh installation. So I’m still confused :confounded: :confused: