Select a dictionary in a vector based on a specific entry value

Hi everyone,

I’m looking for an easy way to filter/select dict in vector based on the value of one of the key.

In the example below, I need to find the title of a data, given by the first Dict of the vector (but it is not always the first…)
To find the good Dict, I need to evaluate the value of the key “propertyUri”, sometimes in association with the key “lang”:

v = [ 
  Dict("typeUri" => "http://www.w3.org/2001/XMLSchema#string", "propertyUri" => "http://nakala.fr/terms#title", "lang" => "fr", "value" => "Z1J255"),
  Dict("typeUri" => "http://www.w3.org/2001/XMLSchema#string", "propertyUri" => "http://nakala.fr/terms#created", "lang" => nothing, "value" => "2024-02-06")
  Dict("typeUri" => "http://www.w3.org/2001/XMLSchema#string", "propertyUri" => "http://nakala.fr/terms#license", "lang" => nothing, "value" => "PDM")
  Dict("typeUri" => "http://www.w3.org/2001/XMLSchema#anyURI", "propertyUri" => "http://nakala.fr/terms#type", "lang" => nothing, "value" => "http://purl.org/coar/resource_type/c_c513")
  Dict("typeUri" => "http://www.w3.org/2001/XMLSchema#string", "propertyUri" => "http://purl.org/dc/terms/subject", "lang" => "fr", "value" => "maitrises")
  Dict("typeUri" => "http://www.w3.org/2001/XMLSchema#string", "propertyUri" => "http://purl.org/dc/terms/subject", "lang" => "fr", "value" => "experts")
  Dict("typeUri" => "http://www.w3.org/2001/XMLSchema#string", "propertyUri" => "http://purl.org/dc/terms/subject", "lang" => "fr", "value" => "Paris")
  Dict("typeUri" => "http://www.w3.org/2001/XMLSchema#string", "propertyUri" => "http://purl.org/dc/terms/subject", "lang" => "fr", "value" => "prosopographie")
]

For now, I’m doing like this

label = Vector()
for meta in metas
  if get(meta, "propertyUri", "") === "http://nakala.fr/terms#title"
    push!(label, get(meta, "value", "unknown"))
  end
end

label[1]

It works, but isn’t there an easier and more effective method? Maybe with predicate like XPath, something like v[:propertyUri === “value”]?

Best,
Josselin

Well there is filter with which you could do

filter(x->get(x, "propertyUri", "") == "http://nakala.fr/terms#title", v)

(Note: don’t use === for string comparison. === checks for identity not equality)
you can of course wrap it in a function for convenience

function select(column, value, list)
    return filter(x -> haskey(x, column) && x[column] == value, list)
end

But to me it looks like you have rows of a table, so I’d recommend using some library to work with tabular data like DataFrames.jl. There is also a very handy addon package called DataFramesMeta.jl that defines a couple convenience macros, so you can almost write SQL-like queries to analyze your data :slight_smile:

Thank you very much for your answer @abraemer!

I’d spotted filter(), but I got an error when I tried to use it… maybe it’s because I sometimes use single quotes instead of double quotes, it’s a mistake I often make with julia… the habit of XQuery I guess!

Unfortunately these aren’t quite rows (I’d have liked that!), I’ve just taken the first metadata for the example, but other dictionaries in my vector don’t have the same keys at all… But maybe that’s not a problem though, maybe I’ll just have missing values when the columns aren’t there? I’ll take a look at it!

(Note: don’t use === for string comparison. === checks for identity not equality)
you can of course wrap it in a function for convenience

Thank you for pointing that out

Best,
Josselin.

Looks a bit like schema-less data ala JSON which makes it a bit harder to work with. In any case, Query.jl which is basically LINQ could be useful:

using Query

@from x in v begin
  @where haskey(x, "propertyUri") && x["propertyUri"] == "http://nakala.fr/terms#title"
  @select get(x, "value", missing)
end
1 Like

@bertschi, Yes, these are large JSON files that are sometimes quite verbose. I’ll have a look at Query.jl. But it looks like it could definitely help me.
Thanks!
Josselin