Join dataframe and geotable

Much more likely it is me that doesn’t understand!

I can’t get GMT to work. I assumed this was because I needed to install GMT separately and that GMT.jl used that. Perhaps that is wrong - as I think you imply. In which case:

using GMT
D = gmtread(joinpath("geography-dz2021-esri-shapefile", "DZ2021.shp"))
println(length(D))

generates the following in VSCode:

ERROR: LoadError: InitError: could not load library "C:\Users\TGebbels\.julia\artifacts\83d62d00715073512a2e76bbb95e027bf701a4e3\bin\libgdal-32.dll"
Access is denied. 
Stacktrace:
  [1] dlopen(s::String, flags::UInt32; throw_error::Bool)
    @ Base.Libc.Libdl .\libdl.jl:117
  [2] dlopen(s::String, flags::UInt32)
    @ Base.Libc.Libdl .\libdl.jl:116
  [3] macro expansion
    @ C:\Users\TGebbels\.julia\packages\JLLWrappers\pG9bm\src\products\library_generators.jl:63 [inlined]
  [4] __init__()
    @ GDAL_jll C:\Users\TGebbels\.julia\packages\GDAL_jll\mKGP0\src\wrappers\x86_64-w64-mingw32-cxx11.jl:41
  [5] run_module_init(mod::Module, i::Int64)
    @ Base .\loading.jl:1134
  [6] register_restored_modules(sv::Core.SimpleVector, pkg::Base.PkgId, path::String)
    @ Base .\loading.jl:1122
  [7] _include_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String, depmods::Vector{Any})
    @ Base .\loading.jl:1067
  [8] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt128)
    @ Base .\loading.jl:1581
  [9] _require(pkg::Base.PkgId, env::String)
    @ Base .\loading.jl:1938
 [10] __require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base .\loading.jl:1812
 [11] #invoke_in_world#3
    @ .\essentials.jl:926 [inlined]
 [12] invoke_in_world
    @ .\essentials.jl:923 [inlined]
 [13] _require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base .\loading.jl:1803
 [14] macro expansion
    @ .\loading.jl:1790 [inlined]
 [15] macro expansion
    @ .\lock.jl:267 [inlined]
 [16] __require(into::Module, mod::Symbol)
    @ Base .\loading.jl:1753
 [17] #invoke_in_world#3
    @ .\essentials.jl:926 [inlined]
 [18] invoke_in_world
    @ .\essentials.jl:923 [inlined]
 [19] require(into::Module, mod::Symbol)
    @ Base .\loading.jl:1746
 [20] include(mod::Module, _path::String)
    @ Base .\Base.jl:495
 [21] top-level scope
    @ C:\Users\TGebbels\.julia\packages\JLLWrappers\pG9bm\src\toplevel_generators.jl:192
 [22] include
    @ .\Base.jl:495 [inlined]
 [23] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::String)
    @ Base .\loading.jl:2222
 [24] top-level scope
    @ stdin:3
during initialization of module GDAL_jll
in expression starting at C:\Users\TGebbels\.julia\packages\GMT_jll\XFwud\src\wrappers\x86_64-w64-mingw32.jl:7
in expression starting at C:\Users\TGebbels\.julia\packages\GMT_jll\XFwud\src\GMT_jll.jl:2
in expression starting at stdin:3
ERROR: LoadError: Failed to precompile GMT_jll [b68b8c3f-ed99-5bef-9675-4739d9426b26] to "C:\\Users\\TGebbels\\.julia\\compiled\\v1.10\\GMT_jll\\jl_58DC.tmp".
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:35
  [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, keep_loaded_modules::Bool)
    @ Base .\loading.jl:2468
  [3] compilecache
    @ .\loading.jl:2340 [inlined]
  [4] (::Base.var"#968#969"{Base.PkgId})()
    @ Base .\loading.jl:1974
  [5] mkpidlock(f::Base.var"#968#969"{Base.PkgId}, at::String, pid::Int32; kwopts::@Kwargs{stale_age::Int64, wait::Bool})
    @ FileWatching.Pidfile C:\Users\TGebbels\.julia\juliaup\julia-1.10.2+0.x64.w64.mingw32\share\julia\stdlib\v1.10\FileWatching\src\pidfile.jl:93
  [6] #mkpidlock#6
    @ C:\Users\TGebbels\.julia\juliaup\julia-1.10.2+0.x64.w64.mingw32\share\julia\stdlib\v1.10\FileWatching\src\pidfile.jl:88 [inlined]
  [7] trymkpidlock(::Function, ::Vararg{Any}; kwargs::@Kwargs{stale_age::Int64})
    @ FileWatching.Pidfile C:\Users\TGebbels\.julia\juliaup\julia-1.10.2+0.x64.w64.mingw32\share\julia\stdlib\v1.10\FileWatching\src\pidfile.jl:111
  [8] #invokelatest#2
    @ .\essentials.jl:894 [inlined]
  [9] invokelatest
    @ .\essentials.jl:889 [inlined]
 [10] maybe_cachefile_lock(f::Base.var"#968#969"{Base.PkgId}, pkg::Base.PkgId, srcpath::String; stale_age::Int64)
    @ Base .\loading.jl:2983
 [11] maybe_cachefile_lock
    @ .\loading.jl:2980 [inlined]
 [12] _require(pkg::Base.PkgId, env::String)
    @ Base .\loading.jl:1970
 [13] __require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base .\loading.jl:1812
 [14] #invoke_in_world#3
    @ .\essentials.jl:926 [inlined]
 [15] invoke_in_world
    @ .\essentials.jl:923 [inlined]
 [16] _require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base .\loading.jl:1803
 [17] macro expansion
    @ .\loading.jl:1790 [inlined]
 [18] macro expansion
    @ .\lock.jl:267 [inlined]
 [19] __require(into::Module, mod::Symbol)
    @ Base .\loading.jl:1753
 [20] #invoke_in_world#3
    @ .\essentials.jl:926 [inlined]
 [21] invoke_in_world
    @ .\essentials.jl:923 [inlined]
 [22] require(into::Module, mod::Symbol)
    @ Base .\loading.jl:1746
 [23] top-level scope
    @ C:\Users\TGebbels\.julia\packages\GMT\vHy9m\src\GMT.jl:47
 [24] include
    @ .\Base.jl:495 [inlined]
 [25] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::String)
    @ Base .\loading.jl:2222
 [26] top-level scope
    @ stdin:3
in expression starting at C:\Users\TGebbels\.julia\packages\GMT\vHy9m\src\GMT.jl:1
in expression starting at stdin:3
ERROR: LoadError: Failed to precompile GMT [5752ebe1-31b9-557e-87aa-f909b540aa54] to "C:\\Users\\TGebbels\\.julia\\compiled\\v1.10\\GMT\\jl_4804.tmp".
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:35
  [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, keep_loaded_modules::Bool)
    @ Base .\loading.jl:2468
  [3] compilecache
    @ .\loading.jl:2340 [inlined]
  [4] (::Base.var"#968#969"{Base.PkgId})()
    @ Base .\loading.jl:1974
  [5] mkpidlock(f::Base.var"#968#969"{Base.PkgId}, at::String, pid::Int32; kwopts::@Kwargs{stale_age::Int64, wait::Bool})
    @ FileWatching.Pidfile C:\Users\TGebbels\.julia\juliaup\julia-1.10.2+0.x64.w64.mingw32\share\julia\stdlib\v1.10\FileWatching\src\pidfile.jl:93
  [6] #mkpidlock#6
    @ C:\Users\TGebbels\.julia\juliaup\julia-1.10.2+0.x64.w64.mingw32\share\julia\stdlib\v1.10\FileWatching\src\pidfile.jl:88 [inlined]
  [7] trymkpidlock(::Function, ::Vararg{Any}; kwargs::@Kwargs{stale_age::Int64})
    @ FileWatching.Pidfile C:\Users\TGebbels\.julia\juliaup\julia-1.10.2+0.x64.w64.mingw32\share\julia\stdlib\v1.10\FileWatching\src\pidfile.jl:111
  [8] #invokelatest#2
    @ .\essentials.jl:894 [inlined]
  [9] invokelatest
    @ .\essentials.jl:889 [inlined]
 [10] maybe_cachefile_lock(f::Base.var"#968#969"{Base.PkgId}, pkg::Base.PkgId, srcpath::String; stale_age::Int64)
    @ Base .\loading.jl:2983
 [11] maybe_cachefile_lock
    @ .\loading.jl:2980 [inlined]
 [12] _require(pkg::Base.PkgId, env::String)
    @ Base .\loading.jl:1970
 [13] __require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base .\loading.jl:1812
 [14] #invoke_in_world#3
    @ .\essentials.jl:926 [inlined]
 [15] invoke_in_world
    @ .\essentials.jl:923 [inlined]
 [16] _require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base .\loading.jl:1803
 [17] macro expansion
    @ .\loading.jl:1790 [inlined]
 [18] macro expansion
    @ .\lock.jl:267 [inlined]
 [19] __require(into::Module, mod::Symbol)
    @ Base .\loading.jl:1753
 [20] #invoke_in_world#3
    @ .\essentials.jl:926 [inlined]
 [21] invoke_in_world
    @ .\essentials.jl:923 [inlined]
 [22] require(into::Module, mod::Symbol)
    @ Base .\loading.jl:1746
in expression starting at c:\Users\TGebbels\...\Documents\Julia Experimenting\shapes.jl:4

Maybe these errors are caused by an IT policy or maybe I can resolve them…

Tim

Don’t understand this error, but it’s not the first time we have users reports of this kind of error. Another strange thing is that on Windows Pkg was supposed to have installed the MSVC build of GMT where that error would not have occurred. Is this your home computer?

No, this is work. I tried to install GMT as a package in the usual way. When it failed, I went to IT for GMT (stand alone) and “request denied”.

I did install GMT (stand alone) on my home computer and then installed GMT.jl using Pkg in Julia. I’ve had no problems there.

Work: Win 11
Home: Win 10

@joa-quim , I believe that the thread is now polluted with GMT issues that have nothing to do with the original discussion. Please try to contact the OP directly next time to share off-topic content.

To clarify: this thread is about joining a geotable with a dataframe. In your comment, you shared a code snippet to generate a visualization of the data.

It was intended to be a single thread. Not a discussion that I will not continue.

Agree with Julio best to start a new thread about the GMT issues, but there have been problems with installing artifacts under Windows recently - see here: Permissions issues with artifacts on Windows in 1.10-rc1 (and betas) - Access is denied · Issue #52272 · JuliaLang/julia · GitHub

Just to add that while leftjoin() doesn’t retain the original order of the left-hand dataframe, so adding an index before and sorting after the join is necessary, doing an in-place join using leftjoin!() does preserve the order.

As long as the left-hand dataframe is values(GeoTable), the GeoTable order will be preserved and the resultant dataframe can then be joined with the geometry with georef() without the need to sort.

Something like this:

new_gt = georef(leftjoin!(values(gt) |> DataFrame, df, on=:Col, matchmissing=:equal), gt.geometry)
1 Like

You can do the same with leftjoin if you pass the right kwarg:

order : if :undefined (the default) the order of rows in the result is undefined and may change in future releases. If :left then the order of rows from the left data frame is retained. If :right then the order of rows from the right data frame is retained (non-matching rows are put at the end).

1 Like

@pdeffebach suggested this, too. However, it will not work in the general case, because

non-matching rows are put at the end

The dataset I am joining only contains a subset of rows compared with the GeoTable. As soon as some rows get put to the end, the order no longer matches the geometry column.

I did try doing this but it didn’t work.

a naive alternative to the leftjoin!() solution

df1_in_df2 = filter(!isnothing,indexin(df1.ID1, df2.ID2))
df1.col=Vector{Union{eltype(df2.col), Missing}}(missing,length(df1.ID1))
df1.col[df1_in_df2 ].= df2.col[df1_in_df2 ]

A left join includes all rows from df1 and leaves all rows and columns from df1 untouched. Note that each row in df1 must have at most one match in df2. Otherwise, this function would not be able to execute the join in-place since new rows would need to be added to df1.

Why would someone use this instead of leftjoin!? This solution doesn’t seem to add much to solving the main problems.

I’m not sure I understand the meaning of the question correctly. That is, whether it is a rhetorical question, probable, or not.
If we were in a corporate context it would clearly make little sense to propose various “non-standard” solutions.
But in an “inclusive” context like this (at least as I understand it) it might be interesting (if only for a passing reader) to see different ways to the same problem. Even just to show Julia’s versatility.
It might be more useful to have a comment that, by comparing the different proposals, shows the differences in terms of performance or generality or something else or simply the fact that it does not give a correct answer to the OP’s request.

@TimG we’ve added a new function called tablejoin to join a geotable with a normal table as in your example. Please update to GeoTables.jl v1.20, and let us know if it works as expected.

1 Like

Yes, this now works. Thanks!

Out of curiosity, I tried the following:

println(tablejoin(LAs, combo, on=:LADcode))
361×16 GeoTable over 361 GeometrySet{2,Float64}

@btime tablejoin($LAs, $combo, on=:LADcode)
  6.766 ms (345676 allocations: 17.47 MiB)

println(georef(leftjoin!(values(LAs) |> DataFrame, combo, on=:LADcode, matchmissing=:equal), LAs.geometry))
361×16 GeoTable over 361 GeometrySet{2,Float64}

@btime georef(leftjoin!(values($LAs) |> DataFrame, $combo, on=:LADcode, matchmissing=:equal), $LAs.geometry)
  60.400 μs (186 allocations: 94.52 KiB)
1 Like

I believe that the difference in performance has to do with the fact that our tablejoin performs aggregation when there are multiple matches, whereas leftjoin doesn’t. We will investigate if there is any other potential performance gap to fix. Thanks for reporting :100: