What's your preference for creating literal data-structures?

Having worked fairly extensively with Clojure, I came to appreciate the approach of building systems focused around data vs code. For example, if you are working on a business rules engine, rather than writing a complex collection of functions and structs that describe and execute the rules, it’s better to use a simple collection of composable functions that can operate on a data structure and let the data structure describe the rules. The major advantage of this approach is that there are many more tools to work with vanilla data types (composing, splitting, iterating, etc.) than there are with a custom collection of functions & types.

The only problem I’m coming across with Julia (as compared to Clojure) is that I haven’t been able to settle on a nice, clean, ergonomic approach for constructing large, complex data literals. For example, the following code in Clojure constructs a top-level dictionary with nested dictionaries that have a mix of vectors, strings, and number literals for values:

{ :foo { :bar [1 2 3]
         :baz "Hello, world"
         :qux 9000 }
  :bohica { :bar [4 5 6]
            :baz "Goodnight, moon"
            :qux 2000 } }

So, what’s the best way to do the same in Julia? One option:

Dict(:foo => Dict(:bar => [1, 2, 3],
                  :baz => "Hello, world",
                  :qux => 9000),
     :bohica => Dict(:bar => [4, 5, 6],
                     :baz => "Goodnight, moon",
                     :qux => 2000))

feels just a bit too verbose (especially if, as I’m looking to do, one wants to nest Dicts within Dicts within Dicts). An alternative I’ve explored is to, instead, construct the data as a matrix and then later (via function or macro, doesn’t particularly matter) convert anything with eltype of Pair into a dict:

[:foo => [:bar => [1 2 3]
          :baz => "Hello, world"
          :qux => 9000]
 :bohica => [:bar => [4 5 6]
             :baz => "Goodnight, moon"
             :qux => 2000]]

This is only just a bit more verbose than Clojure, which I like, but I wonder if using data literals like this is too alien to the Julia community. Thoughts?

Have you thought about named tuples?

(; foo = (; bar=[1,2,3], baz="Hello, world", qux=9000),
   bohica = (; bar=[4,5,6], baz="Goodnight, moon", qux=2000))

is pretty even more compact than your Clojure example (and doesn’t need : quoting).

Depends on what you want to do with it and whether you require mutability, of course. But since this is valid Julia syntax, you could easily have a @dict macro that constructs nested Dicts from the same syntax.

3 Likes

You can get this even a little more compact by omitting the not-strictly-needed ;:

(foo = (bar=[1,2,3], baz="Hello, world", qux=9000),
 bohica = (bar=[4,5,6], baz="Goodnight, moon", qux=2000))
2 Likes

FWIW, I make heavy use of NamedTuples and NamedTuple-like structs for things like this all the time. It’s a different tool than a dictionary, but for the sorts of things I need stuff like this, it’s almost always a nicer tool (not just because of the syntax)

If you do find yourself wanting a nice synax for dictionaries though, you can always write a macro:

macro d(ex)
    @assert ex.head == :braces
    dargs = map(ex.args) do arg::Expr
        @assert arg.head == :call
        @assert arg.args[1] == :(:)
        @assert length(arg.args) == 3
        lhs = esc(arg.args[2])
        rhs = esc(arg.args[3])
        :($lhs => $rhs)
    end
    :(Dict($(dargs...)))
end
julia> @d{:foo : @d{:bar : [1 2 3],
                    :baz : "Hello, world",
                    :qux : 9000},
          :bohica : @d{:bar : [4 5 6],
                       :baz : "Goodnite, moon",
                       :qux : 2000}}
Dict{Symbol, Dict{Symbol, Any}} with 2 entries:
  :bohica => Dict(:baz=>"Goodnite, moon", :bar=>[4 5 6], :qux=>2000)
  :foo    => Dict(:baz=>"Hello, world", :bar=>[1 2 3], :qux=>9000)

Is there a way to only require the macro at the outermost level?

For example, with

_dict(x) = x
function _dict(ex::Expr)
    if Meta.isexpr(ex, :tuple) && !isempty(ex.args) && (Meta.isexpr(ex.args[1], :parameters) || Meta.isexpr(ex.args[1], :(=)))
        kws = Meta.isexpr(ex.args[1], :parameters) ? ex.args[1].args : ex.args
        Expr(:call, :Dict, map(kws) do arg
            :($(QuoteNode(arg.args[1])) => $(_dict(arg.args[2])))
        end...)
    else
        return ex
    end
end
macro dict(expr)
    return esc(_dict(expr))
end

You get:

julia> @dict (foo = (bar=[1,2,3], baz="Hello, world", qux=9000),
              bohica = (bar=[4,5,6], baz="Goodnight, moon", qux=2000))
Dict{Symbol, Dict{Symbol, Any}} with 2 entries:
  :bohica => Dict(:baz=>"Goodnight, moon", :bar=>[4, 5, 6], :qux=>2000)
  :foo    => Dict(:baz=>"Hello, world", :bar=>[1, 2, 3], :qux=>9000)

(You might want to use PropDicts.jl for this instead of Dict)

(Yes, see above for example.)

2 Likes

Ah, yes, I had actually looked at named tuples, but from a little bit of benchmarking I was concerned by access speed. Constructing the tuples is way faster than constructing the Dicts, but access to nested members is 2x slower:

julia> a = @btime Dict(:foo => Dict(:bar => [1, 2, 3], :baz => "Hello, world", :qux => 9000), :bohica => Dict(:bar => [4, 5, 6], :baz => "Goodnight, moon", :qux => 2000))
  678.212 ns (35 allocations: 2.78 KiB)
Dict{Symbol, Dict{Symbol, Any}} with 2 entries:
  :bohica => Dict(:baz=>"Goodnight, moon", :bar=>[4, 5, 6], :qux=>2000)
  :foo    => Dict(:baz=>"Hello, world", :bar=>[1, 2, 3], :qux=>9000)
julia> @btime a[:foo][:bar]
  41.901 ns (0 allocations: 0 bytes)
julia> b = @btime (foo=(bar=[1, 2, 3], baz="Hello, world", qux=9000), bohica=(bar=[4, 5, 6], baz="Goodnight, moon", qux=2000))
  37.588 ns (4 allocations: 160 bytes)
(foo = (bar = [1, 2, 3], baz = "Hello, world", qux = 9000), bohica = (bar = [4, 5, 6], baz = "Goodnight, moon", qux = 2000))
julia> @btime b.foo.bar
  83.505 ns (2 allocations: 64 bytes)

For what I’m working on, I do not need the data structures to be mutable, so named tuples would work in that regard, but I will be spending a lot more runtime accessing the data than constructing it, so the 2x factor scared me off a bit. It may be that this is premature optimization, or it may be that I’m doing something wrong with the benchmark (for example, I don’t understand why named tuple access should be causing 2 allocations)…

I am starting to think more seriously about a macro based approach, though. It’s funny…Clojure, with its Lisp heritage, has one of the most powerful macro systems (after Scheme and Racket) and yet it’s very rarely used (and actively recommended against by much of the community).

Careful about how you benchmark this, the results very much depend on whether the particular structure is known at compile time, see this example:

julia> using Chairmarks

julia> ld = Dict(:foo => Dict(:bar => [1, 2, 3],
                         :baz => "Hello, world",
                         :qux => 9000),
            :bohica => Dict(:bar => [4, 5, 6],
                            :baz => "Goodnight, moon",
                            :qux => 2000))
Dict{Symbol, Dict{Symbol, Any}} with 2 entries:
  :bohica => Dict(:baz=>"Goodnight, moon", :bar=>[4, 5, 6], :qux=>2000)
  :foo    => Dict(:baz=>"Hello, world", :bar=>[1, 2, 3], :qux=>9000)

julia> lt = (; foo = (; bar=[1,2,3], baz="Hello, world", qux=9000),
          bohica = (; bar=[4,5,6], baz="Goodnight, moon", qux=2000))
(foo = (bar = [1, 2, 3], baz = "Hello, world", qux = 9000), bohica = (bar = [4, 5, 6], baz = "Goodnight, moon", qux = 2000))

julia> @b ld[:foo][:bar]
28.184 ns

julia> @b $ld[:foo][:bar]
9.576 ns

julia> @b lt.foo.bar
75.581 ns (2 allocs: 64 bytes)

julia> @b $lt.foo.bar
2.359 ns

Using $ interpolates the ld/lt and means the compiler has more type information, which is the situation if you were to use lt.foo.bar in a code, or pass lt into a function.

2 Likes

I’ll add NamedTuple-syntax construction to PropDicts - anyone want to chime in, please post at

Well, shoot…I thought we were getting more type info at the REPL top-level so I didn’t have to worry about such side-effects when benchmarking. Guess not!

I’m having serious second-thoughts about dismissing Named Tuples so quickly, especially since I realized that they can be spread out as args to a method accepting named parameters, saving a step in the code to pull out the relevant pieces of the data structure. e.g.:

julia> function findfoo(; foo=nothing, others...)
         if foo != nothing
           println("Found foo! It's: $(foo)")
         else
           for (_, v) ∈ others
             findfoo(;v...)
           end
         end
       end
findfoo (generic function with 1 method)

julia> a=(;b=(;c=(;d=(;e=(;foo="Hello, world!")))))
(b = (c = (d = (e = (foo = "Hello, world!",),),),),)

julia> findfoo(;a...)
Found foo! It's: Hello, world!

I just had the greatest idea for a visitor pattern implementation!