When plotting a choropleth map, how to "read" the appropriate value of the topojson file and the dataframe?

Hi

@oheil
@davidanthoff

I want to plot a choropleth map using a topojson (I converted the .shp file to a topojson) and a dataframe.

  • The topojson has the “polygons”, coordinates", and the names of the districts.

  • The dataframe has many columns. But the useful ones for this case are: “count” (this is the # of IPs) and “featureId” (this has the names of the districts).

What these files have in common is the name of districts.

Here some images for you to understand:

TOPOJSON FILE

DATAFRAME

This is my code:

@vlplot(width=1000, height=800) +
@vlplot(
    mark={
        :geoshape
    },
    data={
        values=JSON.parsefile("/home/juliana/roc/Data/Topojson/disa_ONT_region.topojson"),
        format={
            type=:topojson,
            feature=:disa_ONT_region
        }
    },
    transform=[{
        lookup=:featureId,
        from={
            data=df_splunk,
            key=:featureId,
            fields=["count"]
        }
    }],
    color={
        "rate:q",
        scale={domain=[0, 0.15], scheme=:reds},
        legend={title="IPs Rate in Ontario - BST"}
    },
    projection={
        type=:albersUsa
    }
)

In the “transform” block, I am calling my dataframe (df_splunk).
I want “VegaLite” to use the column “featureId” (names of districts) and the “count” (number of IPs) to visualize a choropleth map. But I have not succeded yet.

I know that in “lookup” I have to put the field to look for in the dataframe. And that “field” is what both files have in common (in this case the names of the districts).

But in what part am I telling VegaLite to use the names of districts of the topojson file? I guess it is : “properties” or “DA_ID” of the topojson file.

So, what am I doing wrong?

Thank you

Here some of the documentation and tutorials that I have checked:

https://www.queryverse.org/VegaLite.jl/dev/examples/examples_maps/

[https://www.flirtwithjulia.com/2019/02/18/Choropleth-Map-With-VegaLite.jl.html]

Could you try the following:

df_splunk2=DataFrame(DA_ID=df_splunk[!,:featureid],count=Int.(df_splunk[!,count]))

@vlplot(
    width=1000,
    height=800,
    :geoshape,
    data={
        values=JSON.parsefile("/home/juliana/roc/Data/Topojson/disa_ONT_region.topojson"),
        format={
            type=:topojson,
            feature=:disa_ONT_region
        }
    },
    transform=[{
        lookup=:DA_ID,
        from={
            data=df_splunk2,
            key=:DA_ID,
            fields=["count"]
        }
    }],
    color={
        "rate:q",
        scale={domain=[0, 0.15], scheme=:reds},
        legend={title="IPs Rate in Ontario - BST"}
    },
    projection={
        type=:albersUsa
    }
)

the idea is to create and use a new DataFrame where the column names match to the topojson data and count to be an Integer and not a String.

@oheil

When using your code to change the name and convert to INT I get an error. So I modified it a bit:

#This is to change column 2 from "string" to "int"
df_splunk[!, 2].= parse.(Int ,df_splunk[!,2])
eltype.(eachcol(df_splunk))


#I am renaming column "featureId" to "DA_ID"
rename!(df_splunk, Symbol("featureId")=> Symbol("DA_ID"))

But the plot is empty. I can only see the colorbar.

Ah, yes, I see my error now. Your code to modify the dataframe is fine.

Have you ever seen your map data?
(Actually I have some major problems to create just a simple map from topojson data as a file.)
What do you see if you just plot the map with:

@vlplot(
    width=500, height=300,
    mark={
        :geoshape,
        fill=:lightgray,
        stroke=:white
    },
    data={
        values=JSON.parsefile("/home/juliana/roc/Data/Topojson/disa_ONT_region.topojson"),
        format={
            type=:topojson,
            feature=:disa_ONT_region
        }
    },
    projection={type=:albersUsa}
)

I want to be sure, that this part of your plot is working.

And another typo, color must be:

It works. There is no problem with that. Check!

image

Great, now check for the color line, see my post above

I changed the typo. I missed that completely!
But the map is just red. Probably I have to change the scale of the “count” column (check the image I get when using python).

Do you know how to multiply the whole column “count” by a log?

JULIA

PYTHON

Okey, I do not think is the scale. Because when I move the mouse over the image It always says: 0

What happens if you just say:

color="count:q"

no scale and no title?

I guess it is using a default color.

In Python, DA_ID is of type “object”.

“count” can be a bad name, because it is an aggregate method in VegaLite.
Could you try by renaming “count” to something arbitrary like e.g. “myIPcounts” ?

I did. Nothing happens. But the color bar displays NaN.

Is there some chance that you could provide us with the json file?

the topojson? yes. I have to compress it. It is big! 492.0kB

Better some download service like dropbox.

Arrived, could you please delete your above post? My email must not be available anymore. Thanks.

okey. it is deleted.

1 Like

I didn’t found a elegant transformation for the problem, that DA_ID is nested one level down into properties. So I manipulated the topojson data:

using VegaLite, DataFrames, JSON

#I renamed the file to .json => I can open in a browser and get it formated and clickable view on the data
#filename="C:\\Users\\oheil\\Desktop\\disa_ONT_region.json"
filename=
"/home/juliana/roc/Data/Topojson/disa_ONT_region.topojson"
json=JSON.parsefile(filename)

The following code just creates a dummy DataFrame with the counts as Integers up to 1000 and the DA_IDs as column id, this is your df_splunk:

DA_IDs=[ json["objects"]["disa_ONT_region"]["geometries"][key]["properties"]["DA_ID"] for key in keys(json["objects"]["disa_ONT_region"]["geometries"]) ]
counts=Int.(floor.(1000*rand(Float64,length(DA_IDs))))
df_splunk=DataFrame(id=DA_IDs,count=counts)

The following code manipulates the already read in json file by adding a field id at the proper level:

for key in keys(json["objects"]["disa_ONT_region"]["geometries"])
	json["objects"]["disa_ONT_region"]["geometries"][key]["id"]=json["objects"]["disa_ONT_region"]["geometries"][key]["properties"]["DA_ID"]
end

Now the choropleth is working as expected by connecting the two id fields in the two data sets:

@vlplot(
    :geoshape,
    width=500, height=300,
    data={
        values=json,
        format={
            type=:topojson,
            feature=:disa_ONT_region
        }
    },
    transform=[{
        lookup=:id,
        from={
            data=df_splunk,
            key=:id,
            fields=["count"]
        }
    }],
    color={
        "count:q",
        scale={scheme=:reds},
        legend={title="IPs Rate in Ontario - BST"}
    },
    projection={
        type=:albersUsa
    }
)

I omitted the domain option so that the color range is scaled automatically.

visualization

1 Like