How To Quickly Iterate over a Collection of Dictionaries?

Hi folks! :wave:

I have a rather large JSON file. The good news is that the JSON is not heavily nested (only 2 levels deep). What is the most efficient way to filter relevant parts of a JSON structure I need based on a condition? What I was imagining was something like this:

for item in json_data
    if item.condition == "Condition"
        push!(filtered_data, item)
    end
end

Is there a better way to do this? I read in the JSON data with JSON3.jl so really, I am just iterating over a ton of dictionaries. Any suggestions?


Another way to phrase this question:

I have a collection of dictionaries and I need to iterate over them quickly to pick out the dictionaries I want based on a key-value pair within each dictionary. What is the quickest way to iterate over all these dictionaries in Julia?

Yours,

~ tcp :deciduous_tree:

How does the JSON structure look like? If it’s something DuckDB can handle, it can be efficient to delegate reading and filtering to it:

using SQLCollections
using QuackIO
using Accessors

read_json(SQLCollection, "https://duckdb.org/data/json/todos.json") |>
	filter(@o _.completed && startswith(_.title, "a")) |>
	collect


Here, everything before collect (loading and filtering) happens lazily in DuckDB. Only the final result is materialized.

1 Like

It might help to parallelize

ixs = OhMyThreads.tmap(dicts) do d
  d.condition == "c"
end
dicts[ixs]