Spatial Kernel in Julia

Hi,
I am really new to Julia,
I would like to compute a spatial kernel with a resolution of 1000x1000 metres and then save the results as a raster file (“+proj=moll +lon_0=0 +x_0=0 +y_0=0”).

I gave it a try with the KernelDensity package and then trying to interpolate the values but I got stuck as I do not understand how to properly use the pdf function.

Can anyone help?

Thank you for your help

This is my code till I got stuck

using CSV, DataFrames, KernelDensity, Proj, ArchGDAL, Shapefile , Colors, Distributions, Interpolations

# Load the CSV file
df = CSV.read("df.csv", DataFrame)

# Read the world shapefile
world_shapefile = Shapefile.read("WorldShape/World.shp")

# Discard the first column
df = select(df, Not(1))

# Filter by year 
df = filter(row -> 1975 <= row.outbreak_start <= 2020, df)

# Extract x and y columns
x_values = df.x
y_values = df.y


# Extract x and y coordinates
coordinates = hcat(x_values, y_values)


# Create a bivariate kernel density estimator
Bkde = kde(coordinates)

Here is where I face issues

# Set resolution
resolution = 1000

# Generate grid coordinates
x_grid = range(minimum(x_values), maximum(x_values), length=resolution)
y_grid = range(minimum(y_values), maximum(y_values), length=resolution)


# Evaluate KDE on the grid
kde_values = pdf(Bkde, hcat(vec(x_grid), vec(y_grid)))

Hello @anjelinejeline I think this is a simple fix, just need to change your last line to
kde_values = pdf(Bkde, x_grid, y_grid)

Thank you @mlkrock it works !!
Do you know how I can turn it into a raster Geotiff and assign the right crs ?

Thank you

You could use the Rasters package to handle the geotiff part. You can have to see whether it is necessary to reverse the y coordinates that would depend on the values in it.

using GeoFormatTypes, Rasters, ArchGDAL
julia> ras = Raster(kde_values, (X(x_grid), Y(reverse(y_grid))), crs=EPSG(4326))
1000×1000 Raster{Float64,2} with dimensions: 
  X Projected{Float64} 1.0:0.009009009009009009:10.0 ForwardOrdered Regular Points crs: EPSG,
  Y Projected{Float64} 30.0:-0.009009009009009009:21.0 ReverseOrdered Regular Points crs: EPSG
extent: Extent(X = (1.0, 10.0), Y = (21.0, 30.0))
missingval: missing
crs: EPSG:4326

julia> write(tempname()*".tif", ras)
┌ Warning: `missing` cant be written with gdal, missinval for `Float64` of `-Inf` used instead
└ @ Rasters ~/.julia/packages/Rasters/PnKS7/src/utils.jl:29
"/tmp/jl_CJkPkl8F4T.tif"

Hi @Fliks
Thank you
I was able to create a kernel and interpolate it on a grid of the World but the resolution of my output does not match the resolution I initially set for the kernel.

Can you help me get a raster of 1kmx1km resolution?

Thank you

bw=30000::Int64
# Create a bivariate kernel density estimator

Bkde = kde(coordinates, bandwidth=(bw,bw))

# Set resolution
resolution = 1000

# Generate grid coordinates considering the bounding of the World ESRI:54009 Mollweide
x_grid = range(-17601618 , 17601617, length=resolution)
y_grid = range(-9018991, 8751339, length=resolution)

#  Interpolate the values on the World grid 
kde_values = pdf(Bkde, x_grid, y_grid)

# Create a raster file
ras=Raster(kde_values, (X(x_grid), Y(reverse(y_grid))), crs=EPSG(54009))

Resolution should be determined by data, not by wish but ofc one can do that.
Hard to advise more without a reproducible example.

Hi @joa-quim joaquin
My objective is to create a kernel density of a set of spatial points.
The coordinates of my points are in ESRI:54009 Mollweide
The bandwidth I want to use is 3000 metres
I want to interpolate the density on a grid of 1x1km resolution covering the entire world

What I have done so far

I am happy to share the df object but I cannot upload it here as csv are not supported

using CSV, DataFrames, KernelDensity, GeoFormatTypes, Rasters, ArchGDAL, Distributions, Interpolations
using Plots, Shapefile

# Load the CSV file
df = CSV.read("df.csv", DataFrame)

# Discard the first column
df = select(df, Not(1))

# Filter by year 
df = filter(row -> 1960 <= row.outbreak_start <= 2020, df)

# Extract x and y columns
x_values = df.x
y_values = df.y

# Extract x and y coordinates
coordinates = hcat(x_values, y_values)

bw=30000::Int64
# Create a bivariate kernel density estimator

Bkde = kde(coordinates, bandwidth=(bw,bw))

# Set resolution
resolution = 1000

# Generate grid coordinates considering the bounding of the World ESRI:54009 Mollweide
x_grid = range(-17601618 , 17601617, length=resolution)
y_grid = range(-9018991, 8751339, length=resolution)

#  Interpolate the values on the World grid 
kde_values = pdf(Bkde, x_grid, y_grid)

# Create a raster file
ras=Raster(kde_values, (X(x_grid), Y(reverse(y_grid))), crs=EPSG(54009))

plot(ras)

#  Upload the World Shapefile 
World=Shapefile.Table("World.shp") |> DataFrame

plot(World.geometry)

#  Crop the raster using the World Shapefile
ras=crop(ras;to=World.geometry)

#  Mask the raster using the World Shapefile 

ras_masked=mask(ras,with=World.geometry)

plot(ras_masked)

#  Define the coords reference system 
crs = ArchGDAL.toWKT(ArchGDAL.importPROJ4("+proj=moll +lon_0=0 +x_0=0 +y_0=0"))
ras_masked=setcrs(ras_masked,crs)

Mind you that this, in a Float32 array is ~3.5 GB

This is not reproducible.

df_file.jl (30.1 KB)
I have used the JLD2 package (Saving data files in Julia) to share the df with you.
I am happy to share the data as csv but I cannot upload this format here

Sorry, can’t use that data. It errors with

df = filter(row -> 1960 <= row.outbreak_start <= 2020, df)
ERROR: ArgumentError: column name :outbreak_start not found in the data frame

But let me show an example on how that works with GMT.jl and an example from the KernelDensity test suit.
You also seem to be doing some clipping inside continents, which GMT can do as well without any further data (the coast module can do land/ocean clipping

using GMT, KernelDensity

# Use test example in the KernelDensity package
r = KernelDensity.kde_range((-2.0,2.0), 128);
k12 = kde([1.0 1.0], (r,r), bandwidth=(1,1));

# Create a GMT grid. I', expanding the  Mollweide because something is not working with "epsg=54009"
G = mat2grid(Float32.(k12.density), x=k12.x, y=k12.y, proj4="moll +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +type=crs");

# Save as a GeoTIFF file
gmtwrite("k12.tiff", G)

# Show it
viz("k12.tiff")

Thanks joaquim could you please help me understand the code?
where did you specify the resolution?
where should the coordinates be inserted?
where the extent of the world should be specified?

I’m not a kde user but I assumed that the k12.x & k12.y contains the coordinates, which were set with the range(-2,2,128). Don’t do that with rx = range(-180,180, 43200) (360 * 60 * 2 = 43200) and same for lat if you don’t have a some 7 GB to spear with Float64 array.

Note, my example is on how to create a GeoTIFF from the kde result. How to create it as you want is another issue.

HI joa-quim
I was also able to create a kernel with the code I posted
My issue is the resolution … I want a geotiff of 1kmx1km
Can you help me with that?
Where did you specify the resolution of your raster in the code?

But I answered that above. If you insist, try

rx = KernelDensity.kde_range((-180.0,180.0), 43200);
ry = KernelDensity.kde_range((-90.0,90.0), 21600);

Sorry, your coordinates are Mollweid meters so should not use -180,180 but that in Mollweid meters.

Sorry it’s still not clear to me …
what does the kde_range define ?
Sorry but I can’t find a proper documentation here Readme · KernelDensity.jl

My coordinates are in metres the bounding of the Mollweide are
x: -17601618 , 17601617
y: -9018991, 8751339

Should I write something like this?

# Load the CSV file
df = CSV.read("df.csv", DataFrame)

# Discard the first column
df = select(df, Not(1))

# Filter by year 
df = filter(row -> 1960 <= row.outbreak_start <= 2020, df)

# Extract x and y columns
x_values = df.x
y_values = df.y

# Extract x and y coordinates
coordinates = hcat(x_values, y_values)


# Bandwidth 
bw=30000::Int64

#  Defining the bounding of the World ESRI:54009 Mollweide 
# and resolution of 1kmx1km (1000m x 1000m)

rx = KernelDensity.kde_range((-17601618 , 17601617), 1000);
ry = KernelDensity.kde_range(( -9018991, 8751339), 1000);

# Create a bivariate kernel density estimator

Bkde = kde(coordinates,(rx,ry), bandwidth=(bw,bw))
rx = KernelDensity.kde_range((-17601618 , 17601617), 1000);

No. length must be

julia> round(Int,(17601617 + 17601618) / 1000)
35203

so

rx = KernelDensity.kde_range((-17601618 , 17601617), 35202);

Do you see now why I say its big

Thank you I will try right away …
I know that it’s big that’s why I am trying to move from R to julia :sweat_smile:
Can you explain me how I can access the documentation of the kde_range function within the KernelDensity package?
I am struggling to understand how to properly find guidance on the use of the functions within each julia package

A further problem is that calculations return a Float64 array, which is probably not needed. Maybe someone knows how to make it in Float32.

Sorry, mentioned above. I’m no KernelDensity user. Just read a bit of the code and guessed the rest.

Thanks anyway
With regard to the package GMT the function mat2grid does not work in my case

G = mat2grid(Float32.(Bkde.density), x=Bkde.x, y=Bkde.y, proj4=“+proj=moll +lon_0=0 +x_0=0 +y_0=0”)

How can I fix it?