What would be the best way / command to bechmark Geopandas vs VegaLite?

Hi

I have plotted a choropleth map using Geopandas (Python) and Vegalite (Julia).

I want to benchmark both of them (ease of use, speed, etc).
I am very interested in speed. So maybe a @time or @time_ns or tic() toc(), etc.

What will you recommend me? I have used the Jupyter notebook.
My code looks like this:

JULIA

##Calling the libraries
using VegaLite, DataFrames, JSON

##Renaming a column
rename!(df_splunk, Symbol("featureId")=> Symbol("id"));

##Parsing the topojson file
disa_ONT_region=JSON.parsefile("/home/juliana/roc/Data/Topojson/disa_ONT_region.topojson")

for key in keys(disa_ONT_region["objects"]["disa_ONT_region"]["geometries"])
	disa_ONT_region["objects"]["disa_ONT_region"]["geometries"][key]["id"]=disa_ONT_region["objects"]["disa_ONT_region"]["geometries"][key]["properties"]["DA_ID"]
end

@vlplot(
    :geoshape,
    width=500, height=300,
    data={
        values=disa_ONT_region,
        format={
            type=:topojson,
            feature=:disa_ONT_region
        }
    },
    transform=[{
        lookup=:id,
        from={
            data=df_splunk,
            key=:id,
            fields=["count"]
        }
    }],
    color={
        "count:q",
        scale={scheme=:reds},
        legend={title="IPs Rate in Ontario - BST"}
    },
    projection={
        type=:albersUsa
    }
)

PYTHON

##GEOPANDAS
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd
from shapely.geometry import Point, Polygon


## Path to .shp file
sf_path = "/home/juliana/roc/Data/ESRI/disa_ONT_region.shp"
#print(sf_path)


###Read .shp file into a geopandas dataframe and plotting
sf=gpd.read_file(sf_path)
#sf=gpd.read_file(sf_path, encoding='utf-8')
#sf.plot()



## Joining dataframes: df_splunk (2 columns) + sf_Q (3 columns) = new_df
df_splunk=df_splunk.iloc[:, 0:2]
#df_splunk

sf_Q=sf.iloc[:, 0:3]
#sf_Q

new_df=sf_Q.merge(df_splunk, how='left', left_on='DA_ID', right_on='featureId')
new_df


## Plotting the cloropleth map
##https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html
ax=new_df.plot(figsize=(10,6), column='count', cmap='Blues', k=5, legend=False)
plt.title("Number of IPs by district - Ontario")
ax.set_axis_off()

Thank you

I think it is going to pin down any one setup for a performance configuration, at least on the VegaLite.jl side of things. Here some of the different ways it operates, each one with a different performance profile:

  • if you just plot on the REPL and let the plot open in the web browser, VegaLite.jl in the Julia process doesn’t do any processing, it just writes out an HTML file out to disc with embedded JSON of the data, and then all processing is happening in the browser
  • if you plot with ElectronDisplay.jl, the JSON spec is sent via a named pipe to the electron client window (with a intermediate hop to the main electron process), and then all rendering, processing is happening in the electron client process.
  • if you save a plot, the Julia process will spawn a Node process, send the spec via stdout to that process, and that way save the file to disc. All processing is happening in the node process in that case.
  • if you plot in Jupyter, things are most involved: VegaLite.jl will send the plot as a Vega-Lite MIME type and as a PNG MIME type to the Jupyter front-end. The conversion to PNG is done via a Node process.

Many of these conversion processes are not triggered when you call @vlplot, so it might actually be quite tricky to even time them properly…

3 Likes

What is a “node process”? Why it is not wise to use it when doing benchmark?

Do you think if the compilation time is measured it will help to properly do the benchmark?

https://discourse.julialang.org/t/what-would-be-the-equivalent-of-this-piece-of-python-code/34827/4

I just generally feel there are so many moving parts, and different moving parts in different scenarios, that it is difficult to actually pin down what you would want to measure…

1 Like