Does Julia have equivalent of Python: DF.column.map( dict )

The best I could find is map( x → dict , DF.column )

Your solution is perfectly fine.
The main reason that Python needs an explicit function for this is performance - Pandas map on a Dictionary is much faster than e.g. applying an anonymous Python function.
As an alternative to an anonymous function you could use broadcasting:

getindex.(Ref(dict), DF.column)

The Ref indicates here that broadcasting is not done for the dictionary, but only for the array (2nd argument).

Thanks lungben. I didn’t know about the Ref syntax . Thats useful.

Dictionary lookup cant be directly broadast ie. dict.[ DF.column ]
doesnt work.

Unfortunately not, but I agree that it would be nice to have.
Maybe it is worth filing an issue to introduce a syntax like this?

d.[A] 
# should be equivalent to
getindex.(Ref(d), A) 

Or is there a fundamental reason against it?

That would be great.

The experts may chime in soon with a concise explanation, but in the meantime if you’d like to dig into the reasoning that led to the current design, check out issues #18618 and #25904.

2 Likes

for me at least using ggplot2’s mpg dataset

idx_map = Dict(key => idx for (idx, key) in enumerate(unique(df.class)))

julia> @time map(x -> idx_map[x], df.class)
  0.060553 seconds (124.05 k allocations: 8.437 MiB, 99.45% compilation time)

julia> @time getindex.(Ref(idx_map), df.class)
  0.000025 seconds (4 allocations: 2.062 KiB)

Huge win for getindex.!

Maybe. But @time is not suitable for microbenchmarks. Can you try the same using BenchmarkTools? And remember variable interpolation.

This is a compilation time thing. Creating a new anonymous function has a fixed compilation cost, so map(t -> ..., x) is slow

julia> x = rand(1:5, 100);

julia> d = Dict(1 => "A", 2 => "B", 3 => "C", 4 => "D", 5 => "E");

julia> @time getindex.(Ref(d), x);
  0.144137 seconds (172.92 k allocations: 9.074 MiB, 42.83% gc time)

julia> @time getindex.(Ref(d), x);
  0.000016 seconds (4 allocations: 976 bytes)

julia> @time map(xi -> d[xi], x);
  0.081783 seconds (98.49 k allocations: 5.256 MiB)

julia> @time map(xi -> d[xi], x);
  0.078627 seconds (55.19 k allocations: 2.922 MiB)

julia> get_from_d = let d = d
       xi -> d[xi]
       end;

julia> @time map(get_from_d, x);
  0.043167 seconds (43.72 k allocations: 2.272 MiB)

julia> @time map(get_from_d, x);
  0.000024 seconds (2 allocations: 928 bytes)

This re-compilation problem only shows up in global scope, though.

julia> function get_from_d_wrapper(d, x)
       map(xi -> d[xi], x)
       end;

julia> @time get_from_d_wrapper(d, x);
  0.037174 seconds (46.42 k allocations: 2.426 MiB)

julia> @time get_from_d_wrapper(d, x);
  0.000006 seconds (1 allocation: 896 bytes)
4 Likes