Applying Dict to DataFrame column


#1

I figured out a way to use Dict to do some text processing on Strings within a DataFrame, but I suspect there’s a better way to do it.

shell> cat benchmarks.csv
c,iteration_mandelbrot,0.263798
c,recursion_fibonacci,0.024076
julia,iteration_mandelbrot,0.167139
julia,recursion_fibonacci,0.040278
lua,iteration_mandelbrot,0.104
lua,recursion_fibonacci,0.026

julia> using DataFrames

julia> benchmarks = readtable("benchmarks.csv", header=false, names=[:language, :benchmark, :time])
6×3 DataFrames.DataFrame
│ Row │ language │ benchmark              │ time     │
├─────┼──────────┼────────────────────────┼──────────┤
│ 1   │ "c"      │ "iteration_mandelbrot" │ 0.263798 │
│ 2   │ "c"      │ "recursion_fibonacci"  │ 0.024076 │
│ 3   │ "julia"  │ "iteration_mandelbrot" │ 0.167139 │
│ 4   │ "julia"  │ "recursion_fibonacci"  │ 0.040278 │
│ 5   │ "lua"    │ "iteration_mandelbrot" │ 0.104    │
│ 6   │ "lua"    │ "recursion_fibonacci"  │ 0.026    │

julia> dict = Dict("c"=>"C", "julia"=>"Julia", "lua"=>"LuaJIT");

julia> for i in 1:size(benchmarks)[1]
           benchmarks[i,1] = dict[benchmarks[i,1]]
       end
                                                                                                                            
julia> benchmarks
6×3 DataFrames.DataFrame                                                                                                    
│ Row │ language │ benchmark              │ time     │                                                                      
├─────┼──────────┼────────────────────────┼──────────┤                                                                      
│ 1   │ "C"      │ "iteration_mandelbrot" │ 0.263798 │                                                                      
│ 2   │ "C"      │ "recursion_fibonacci"  │ 0.024076 │                                                                      
│ 3   │ "Julia"  │ "iteration_mandelbrot" │ 0.167139 │                                                                      
│ 4   │ "Julia"  │ "recursion_fibonacci"  │ 0.040278 │                                                                      
│ 5   │ "LuaJIT" │ "iteration_mandelbrot" │ 0.104    │
│ 6   │ "LuaJIT" │ "recursion_fibonacci"  │ 0.026    │

Any suggestions to replace the for loop?


#2

Something like benchmarks[:language] = [dict[lang] for lang in benchmarks[:language]]?


#3

Thank you! I actually formulated something close to the right-hand-side of that expression but found that its type was Array{String,1} and didn’t have faith that it would play nice with the type DataArrays.DataArray{String,1} on the LHS.