Using ArchGDAL

Hi, I wanted,

to give Julia a chance because it said to be rather fast. And some parts of my Python code needs hours (like 3 to 5) and that just to much, to be productively useful. Especial when you still have to debug the code.

I want to do some calculation on some masked Sentinel2 tiles and use ArchGDAL.jl for that.

The main code: is opening two raster files and (one with real data, the other as indicies of the reagions. And than I want to compute the mode of each region and than save the data of the mode into an new file.

The code that is currently running (but I wouldn’t mind, when it be faster) is:

using StatsBase
using ArchGDAL; const AG = ArchGDAL

class_raster = Array{Int64}(ArchGDAL.read(AG.read(prediction_path), 1))
class_raster[class_raster .== 55537] .= 0
class_raster[class_raster .== -9999] .= 0
Array{UInt8}(class_raster)
indices = Array{Int64}(ArchGDAL.read(AG.read(atkis_area_path), 1))

unique_indices = unique(indices)
for _index in unique_indices
       
    coordinates = findall(indices .== _index)
    mode_val = mode(class_raster[findall(index_raster .== _index)])
    for x in coordinates
        local_classes[x] = mode_val
    end
end

But when I want to save the file for instance with:

dataset = AG.read(prediction_path)
ref = AG.getproj(dataset)
geotransform = AG.getgeotransform(dataset)

raster = AG.unsafe_create(
        "test.tif",
        AG.getdriver("GTiff"),
        width = ArchGDAL.width(dataset),
        height = ArchGDAL.height(dataset),
        nbands = 1,
        dtype = UInt8
    )
AG.setgeotransform!(raster, geotransform)
AG.setproj!(raster, ref)

AG.write!(
    raster,
    local_classes,  # image to "burn" into the raster
    1,      # update band 1
    )

I still get an error when calling unsave_create:

MethodError: no method matching unsafe_create(::String, ::ArchGDAL.Driver; width=7328, height=5860, nbands=1, dtype=UInt8)
Closest candidates are:
unsafe_create(::AbstractString; driver, width, height, nbands, dtype, options) at C:\Users\TITAN.julia\packages\ArchGDAL\s5Bwr\src\dataset.jl:191

Stacktrace:
[1] top-level scope at In[28]:1
[2] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091

I don’t know why! Didn’t I add every parameter needed? Is it a problem with the types? Additionally I totally puzzled when to use AG and when ArchGDAL.

I hope you could help me out. Thanks alot!

If you’re loading non-rotated regular tif (or other kinds of raster) files, GeoData.jl wraps ArchGDAL.jl, and can do this in a lot less code:

using GeoData, ArchGDAL
raster = GDALarray(prediction_path) |> GeoArray
indices = GDALarray(atkis_area_path) |> GeoArray

# Do your things

# Then save
write("output.tif", GDALarray, results)

Its a newly published packages so there might be a few bugs, but I use it every day so it works OK.

All the tif metadata is maintained through any math, array slicing or broadcasts. You can just write the results array as a tif and it should work. Make an issue on GeoData.jl if it doesn’t.

1 Like

You can do either as const AG = ArchGDAL defines those to be the same. But really you should then just use AG, otherwise what’s the point of that definition.

1 Like

If you read this very carefully it is saying that you are passing the driver as a positional argument, whereas it is expecting it as a keyword argument, behind the ;. Where did you base this code on? Would be good to know if something in the docs needs to be updated. There is also no need to use unsafe_create anymore, you can directly use create, also without using a do-block, see the two examples here:

https://yeesian.com/ArchGDAL.jl/stable/reference/#ArchGDAL.create-Tuple{AbstractString}

Hi,
thanks at @Raf. Your answer has been very fast. Still I could not really apply this method, because I would have to write with an GeoArray… and It didn’t know how to create those… so I installed GeoArrays… and it seamed like using the packages GeoData, ArchGDAL and GeoArray lead to many problems.

For the usage of ArchGDAl’s copy: I neither saw the difference, of the ; and the , now would I have known, what it meaned aktough I read I Julia book so far.

My code currently looks like:

using ArchGDAL; const AG = ArchGDAL

prediction_path = “E:\SeH\8_TrainingData\output\Python\results\RDF_Level3_full_Prediction_DLM50_20m_manual_v2_41features.tif”
atkis_area_path = “E:\SeH\COP4EE-2_BB_indexes.tif”

raster = Array{Int64}(ArchGDAL.read(AG.read(prediction_path), 1))
raster[raster .== 55537] .= 0
raster[raster .== -9999] .= 0
raster = Array{UInt8}(raster)
indices = Array{Int64}(ArchGDAL.read(AG.read(atkis_area_path), 1))

unique_indices = unique(indices)

Threads.@threads for _index in unique_indices

coordinates = findall(indices .== _index)
raster[coordinates] .= StatsBase.mode(raster[coordinates])

end

dataset = AG.read(prediction_path)
ref = AG.getproj(dataset)
geotransform = AG.getgeotransform(dataset)

raster_ds = AG.create(
“test.tif”,
driver = AG.getdriver(“GTiff”),
width = ArchGDAL.width(dataset),
height = ArchGDAL.height(dataset),
nbands = 1,
dtype = UInt8
)
AG.setgeotransform!(raster_ds, geotransform)
AG.setproj!(raster_ds, ref)

AG.write!(
raster_ds,
raster, # image to “burn” into the raster
1, # update band 1
)

and it’s really faster than python, because I am usage serval threads for the loop.

The only problem I currently have is, that the file that is created seams to be broken. :frowning:

Does anyone has a clue why?

Thanks alot!

P.S.:

The Output is:

GDAL Dataset (Driver: GTiff/GeoTIFF)
File(s):
test.tif

Dataset (width x height): 7328 x 5860 (pixels)
Number of raster bands: 1
[GA_Update] Band 1 (Gray): 7328 x 5860 (UInt8)

But the file is empty.

That script I posted makes a GeoData.GeoArray

Thats what this does:

using GeoData, ArchGDAL
raster = GDALarray(prediction_path) |> GeoArray

When prediction_path is some String path to a file GDAL can open.

You can save it again with:

write("output.tif", GDALarray, raster)

Or as a Netcdf:

using NCDatasets
write("output.ncd",  NCDarray, raster[Band(1)])

The point is there isn’t anything you have to do.

You can also operate on it in between loading and saving:

raster = GDALarray(prediction_path; usercrs=EPSG(4326)) |> GeoArray
# Slice a subsection and multiply it by 100
raster = raster[Lat(Between(20, 20), Lon(Between(90, 180))] .* 100
# Save the new raster
write("output.tif", GDALarray, raster)

That just works you don’t have to do anything. Most julia array math and methods will just work
as if it’s a normal array, but it will always remain a GeoArray you can plot and save to disk.

It also doesn’t matter what the underlying CRS is you can just index with Between using EPSG:4326 lat/lon coords and it will work if you have set the usercrs as above.

You can also plot it:

using Plots
raster |> plot

@Raf Thanks, for the help. The original code idea is working now. Although, I hoped it would be faster even when using Multithreading. Is there a way to change the datatype of the attribute data in the GeoArray?

The new problem is how to add or remove bands to an GeoArray, or can I merge two GeoArray, so that these combine to an GeoArray of two bands?

1 Like

What is not fast, loading the file or indexing in to it? Maybe post some code here that is slow and we can work through it (even a link to a file too).

The easiest way to change the array element type is to just broadcast:

A = Float32.(A)

Which will give you a new GeoArray holding Float32. If that is what you mean?

Joining two arrays in the Band (or any) dimension you can do:

C = cat(A, B; dims=Band)

But they might both be band 1, so the joined dim index will be [1, 1], which you probably don’t want. I’m going to fix cat a little to make this easier, but for now you can do:

C = cat(A, B; dims=Band)
C = setdims(C, Band(1:2, NoIndex()))

Although there is a better way of doing this coming with set

Edit: It seems to save fine even if the bands are [1, 1], and will load again as 1:2, so it might not matter.

@Raf Thanks for the Tip with the broadcast.

The current speed is limited by the indexing. Especially because it’s over different indices.

It takes about 30 min for this sentinel 2 tile, jupyter using 4 threads on a i7-6700K at 4 GHz.

I splitted the problem in 2 functions:

function apply_multiband_index_filter(prediction_path::String, index_path::String)
    raster = GDALarray(prediction_path) |> GeoArray
    raster[raster .== 55537] .= 0
    raster[raster .== -9999] .= 0
    raster = UInt8.(raster)
        
    indices = GDALarray(index_path) |> GeoArray
    indices[indices .== 55537] .= 0
    indices[indices .== -9999] .= 0
    indices = Array{Int64}(indices)
    
    county_indices = indices[:, :, 1]
    atkis_indices = indices[:, :, 2]
    invekos_indices = indices[:, :, 3]
    
    class_atkis_filtered = apply_multi_band_single_filter(raster, county_indices, atkis_indices)
    class_invekos_filtered = apply_multi_band_single_filter(raster, county_indices, invekos_indices)

    comb_filtered = cat(class_atkis_filtered, class_invekos_filtered; dims=Band)
    comb_filtered = setdims(comb_filtered, Band(1:2))
    return comb_filtered
end

and

function apply_multi_band_single_filter(prediction, county_index, inner_index)
    filtered_prediction = deepcopy(prediction)


    for u_county_idx in unique(county_index)
        county_mask = county_index .== u_county_idx
        coordinates = findall(county_mask)
    
        filter1d_inner_indices = inner_index[coordinates]
        Threads.@threads for u_inner_idx in unique(filter1d_inner_indices)
            tmp_inner_2d_mask = inner_index .== u_inner_idx
            tmp_inner_1d_mask = filter1d_inner_indices .== u_inner_idx

            temp_inner_mode = (u_inner_idx == 0) ? 0 : StatsBase.mode(filtered_prediction[coordinates][findall(tmp_inner_1d_mask)])

            filtered_prediction[findall(@. county_mask & tmp_inner_2d_mask)] .= temp_inner_mode
            end
        end

    return filtered_prediction
end


I split the indexes and calculated them seperately and than I wann to concatinate them again, to save both in one file.

The coordinates are from a common copied GeoArray, nevertheless writing the GeoArray to disc (as tif), results in this error:

MethodError: no method matching indexorder(::AutoOrder)
Closest candidates are:
indexorder(!Matched::Tuple{}) at C:\Users\TITAN.julia\packages\DimensionalData\IHaNW\src\dimension.jl:182
indexorder(!Matched::Ordered) at C:\Users\TITAN.julia\packages\DimensionalData\IHaNW\src\mode.jl:38
indexorder(!Matched::Unordered) at C:\Users\TITAN.julia\packages\DimensionalData\IHaNW\src\mode.jl:56
…

Stacktrace:
[1] indexorder(::Band{UnitRange{Int64},AutoMode{AutoOrder},Nothing}) at C:\Users\TITAN.julia\packages\DimensionalData\IHaNW\src\dimension.jl:167
[2] reorderindex(::GeoArray{UInt8,3,Tuple{Lon{LinRange{Float64},Projected{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Regular{Float64},Intervals{Start},WellKnownText{GeoFormatTypes.CRS,String},Nothing},GDALdimMetadata{Any,Any}},Lat{LinRange{Float64},Projected{Ordered{DimensionalData.Reverse,DimensionalData.Reverse,DimensionalData.Forward},Regular{Float64},Intervals{Start},WellKnownText{GeoFormatTypes.CRS,String},Nothing},GDALdimMetadata{Any,Any}},Band{UnitRange{Int64},AutoMode{AutoOrder},Nothing}},Tuple{},Array{UInt8,3},String,GDALarrayMetadata{String,Any},UInt16}, ::Band{DimensionalData.Forward,AutoMode{AutoOrder},Nothing}, ::DimensionalData.Forward) at C:\Users\TITAN.julia\packages\DimensionalData\IHaNW\src\utils.jl:157
[3] reorderindex(::GeoArray{UInt8,3,Tuple{Lon{LinRange{Float64},Projected{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Regular{Float64},Intervals{Start},WellKnownText{GeoFormatTypes.CRS,String},Nothing},GDALdimMetadata{Any,Any}},Lat{LinRange{Float64},Projected{Ordered{DimensionalData.Reverse,DimensionalData.Reverse,DimensionalData.Forward},Regular{Float64},Intervals{Start},WellKnownText{GeoFormatTypes.CRS,String},Nothing},GDALdimMetadata{Any,Any}},Band{UnitRange{Int64},AutoMode{AutoOrder},Nothing}},Tuple{},Array{UInt8,3},String,GDALarrayMetadata{String,Any},UInt16}, ::Band{DimensionalData.Forward,AutoMode{AutoOrder},Nothing}) at C:\Users\TITAN.julia\packages\DimensionalData\IHaNW\src\utils.jl:148
[4] reorderindex at C:\Users\TITAN.julia\packages\DimensionalData\IHaNW\src\utils.jl:144 [inlined]
[5] (::GeoData.var"#161#163")(::GeoArray{UInt8,3,Tuple{Lon{LinRange{Float64},Projected{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Regular{Float64},Intervals{Start},WellKnownText{GeoFormatTypes.CRS,String},Nothing},GDALdimMetadata{Any,Any}},Lat{LinRange{Float64},Projected{Ordered{DimensionalData.Reverse,DimensionalData.Reverse,DimensionalData.Forward},Regular{Float64},Intervals{Start},WellKnownText{GeoFormatTypes.CRS,String},Nothing},GDALdimMetadata{Any,Any}},Band{UnitRange{Int64},AutoMode{AutoOrder},Nothing}},Tuple{},Array{UInt8,3},String,GDALarrayMetadata{String,Any},UInt16}) at C:\Users\TITAN.julia\packages\GeoData\MVllE\src\sources\gdal.jl:135
[6] |> at .\operators.jl:834 [inlined]
[7] write(::String, ::Type{GDALarray}, ::GeoArray{UInt8,3,Tuple{Lon{LinRange{Float64},Projected{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Regular{Float64},Intervals{Start},WellKnownText{GeoFormatTypes.CRS,String},Nothing},GDALdimMetadata{Any,Any}},Lat{LinRange{Float64},Projected{Ordered{DimensionalData.Reverse,DimensionalData.Reverse,DimensionalData.Forward},Regular{Float64},Intervals{Start},WellKnownText{GeoFormatTypes.CRS,String},Nothing},GDALdimMetadata{Any,Any}},Band{UnitRange{Int64},AutoMode{AutoOrder},Nothing}},Tuple{},Array{UInt8,3},String,GDALarrayMetadata{String,Any},UInt16}) at C:\Users\TITAN.julia\packages\GeoData\MVllE\src\sources\gdal.jl:134
[8] top-level scope at In[36]:1
[9] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091

Ok this part is an easy fix

You need to use:

setdims(comb_filtered, Band(1:2, NoIndex()))

To set the index type. Its annoying I know.
The new set method fixes having to do any of that, but it isn’t finished yet.

This isn’t the fastest way in julia:

raster[raster .== 55537] .= 0
raster[raster .== -9999] .= 0

You are allocating new geoarrays of booleans to index with. I’ll look into anything makes that especially slow with GeoArray compared to a regular array, but it’s better to do:

replace!(raster, 55537 => 0)
replace!(raster, -9999 => 0)

This just runs a function over every value in the array and updates it in place, so no allocations are made. Avoiding allocations is one of the main strategies for high performance julia.

But I guess the time is probably in those filter calls too.

Things like this are allocating:

filtered_prediction[findall(@. county_mask & tmp_inner_2d_mask)] .= temp_inner_mode
filtered_prediction[coordinates][findall(tmp_inner_1d_mask)]

And your:

tmp_inner_1d_mask
tmp_inner_2d_mask

Could be defined outside the loop and reused (one for each thread) so you don’t keep allocating it.

Thanks, @Raf

I think defining the type in the function with e. g.

local varname::GeoArray{2}

didn’t increase the speed, also it didn’t the accumulation. I am wondering, whether I could rearrange the code, that that I only wouldnneed a 2d or 1d array.

But because auf the speed issue I am reworking the algorithm. Therefore I want to ask, how to deal with the GeoArray, in more detail.

How do I apply a function like unique, maximum, find max for example on a GeoArray with a single Band or Multiple bands?

How can I extract the value of the bands, from the arrays Index, or how can I find the geo position from the index?

Perhaps it would be helpful,to add some usage examples for these on github,etc.
Thanks a lot!

If you want examples checkout the docs for DimensionalData.jl. GeoArray is just a special case of DimArray from DimensionalData.jl.
You can use maximum etc just like you would on a regular array, but you can use Lat/Lon etc as the dims argument if you need to.

I think your main problem for performance is the allocations you are making. GeoArray will be slower than normal array if you allocate or slice because it has the dim index as well. But in high performance code you shouldn’t do that anyway - and if you don’t slice or allocate GeoArray should be identical in performance to a regular array.

You should work out what is actually slow before you do anything else. Use ProfileView.jl to do this, or @profiler in Atom. And use BenchmarkTools.jl to time everything. Try to get you allocations to decrease and time every change.

Things like this don’t do anything - It will be type stable anyway:
local varname::GeoArray{2}

Edit: also remember you can just get the parent Array from a GeoArray with A = parent(geoarray) and work with that, and copy your result back to a GeoArray, or use geoarray = rebuild(geoarray; data=A) to rebuild with new data (the same size of course).