What would be the best way / command to bechmark Geopandas vs VegaLite?

Aizzaac · February 17, 2020, 10:03pm

Hi

I have plotted a choropleth map using Geopandas (Python) and Vegalite (Julia).

I want to benchmark both of them (ease of use, speed, etc).
I am very interested in speed. So maybe a @time or @time_ns or tic() toc(), etc.

What will you recommend me? I have used the Jupyter notebook.
My code looks like this:

JULIA

##Calling the libraries
using VegaLite, DataFrames, JSON

##Renaming a column
rename!(df_splunk, Symbol("featureId")=> Symbol("id"));

##Parsing the topojson file
disa_ONT_region=JSON.parsefile("/home/juliana/roc/Data/Topojson/disa_ONT_region.topojson")

for key in keys(disa_ONT_region["objects"]["disa_ONT_region"]["geometries"])
	disa_ONT_region["objects"]["disa_ONT_region"]["geometries"][key]["id"]=disa_ONT_region["objects"]["disa_ONT_region"]["geometries"][key]["properties"]["DA_ID"]
end

@vlplot(
    :geoshape,
    width=500, height=300,
    data={
        values=disa_ONT_region,
        format={
            type=:topojson,
            feature=:disa_ONT_region
        }
    },
    transform=[{
        lookup=:id,
        from={
            data=df_splunk,
            key=:id,
            fields=["count"]
        }
    }],
    color={
        "count:q",
        scale={scheme=:reds},
        legend={title="IPs Rate in Ontario - BST"}
    },
    projection={
        type=:albersUsa
    }
)

PYTHON

##GEOPANDAS
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd
from shapely.geometry import Point, Polygon


## Path to .shp file
sf_path = "/home/juliana/roc/Data/ESRI/disa_ONT_region.shp"
#print(sf_path)


###Read .shp file into a geopandas dataframe and plotting
sf=gpd.read_file(sf_path)
#sf=gpd.read_file(sf_path, encoding='utf-8')
#sf.plot()



## Joining dataframes: df_splunk (2 columns) + sf_Q (3 columns) = new_df
df_splunk=df_splunk.iloc[:, 0:2]
#df_splunk

sf_Q=sf.iloc[:, 0:3]
#sf_Q

new_df=sf_Q.merge(df_splunk, how='left', left_on='DA_ID', right_on='featureId')
new_df


## Plotting the cloropleth map
##https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html
ax=new_df.plot(figsize=(10,6), column='count', cmap='Blues', k=5, legend=False)
plt.title("Number of IPs by district - Ontario")
ax.set_axis_off()

Thank you

davidanthoff · February 17, 2020, 10:45pm

I think it is going to pin down any one setup for a performance configuration, at least on the VegaLite.jl side of things. Here some of the different ways it operates, each one with a different performance profile:

if you just plot on the REPL and let the plot open in the web browser, VegaLite.jl in the Julia process doesn’t do any processing, it just writes out an HTML file out to disc with embedded JSON of the data, and then all processing is happening in the browser
if you plot with ElectronDisplay.jl, the JSON spec is sent via a named pipe to the electron client window (with a intermediate hop to the main electron process), and then all rendering, processing is happening in the electron client process.
if you save a plot, the Julia process will spawn a Node process, send the spec via stdout to that process, and that way save the file to disc. All processing is happening in the node process in that case.
if you plot in Jupyter, things are most involved: VegaLite.jl will send the plot as a Vega-Lite MIME type and as a PNG MIME type to the Jupyter front-end. The conversion to PNG is done via a Node process.

Many of these conversion processes are not triggered when you call @vlplot, so it might actually be quite tricky to even time them properly…

Aizzaac · February 18, 2020, 10:08pm

What is a “node process”? Why it is not wise to use it when doing benchmark?

Do you think if the compilation time is measured it will help to properly do the benchmark?

https://discourse.julialang.org/t/what-would-be-the-equivalent-of-this-piece-of-python-code/34827/4

davidanthoff · February 19, 2020, 1:47am

I just generally feel there are so many moving parts, and different moving parts in different scenarios, that it is difficult to actually pin down what you would want to measure…

Topic		Replies	Views
Is VegaLite.jl fully written in julia language? General Usage first-steps , visualization	2	598	February 17, 2020
Saving vegalite plots takes very long time compared to plots.jl Visualization question , plotting , input-output	10	887	October 12, 2020
What is the status of the Plots ecosystem and what package should I use? Visualization	11	3771	April 6, 2020
Graphical Exploratory Data Analysis in Julia with VegaLite/Gadfly/Plots Visualization	34	3999	June 9, 2020
VegaLite.jl: render as raw image Visualization question	8	948	September 12, 2019

What would be the best way / command to bechmark Geopandas vs VegaLite?

Related topics