Transpose dictionary

Hello,

I need to transpose a dictionary, but my idea is to use the temporary dictionary X and copy the elements in a transposed manner. Exist something for the critical performance section where the transposition of a dictionary is needed. Here is an example of actual code:

using DataFrames
using Dates

df_regional_daily = DataFrame(
        DateTime = Date[],
        MonthSin = Float32[],
        MonthCos = Float32[],
        DaySin = Float32[],
        DayCos = Float32[],
        Name = String[],
        Latitude = Float32[],
        Longitude = Float32[]
)
for j in 1:16
    df_regional_daily[!, "Value$(j)"] = Float32[]
end

type or paste code here

X = Dict{String, Array{Float32}}()     # temporary
for f in 1:length(data_regional["features"])
    irradiance = data_regional["features"][f]["properties"]["parameter"]["ALLSKY_SFC_SW_DWN"]
    for (t, value) in irradiance
        if haskey(X, t)
            push!(X[t], value)        # copying
        else
            X[t] = [value]            # creating
        end
    end
end

for (t, value) in X
    t = Date(t, "yyyymmdd")
    push!(df_regional_daily, [
        t,
        sinpi(month(t) / MONTH_PERIOD * 2),
        cospi(month(t) / MONTH_PERIOD * 2),
        sinpi(dayofyear(t) / DAY_PERIOD * 2),
        cospi(dayofyear(t) / DAY_PERIOD * 2),
        location_name,
        location[1],
        location[2],
        value...
    ])
end

In the worst case, this code is not memory and computation effective and works with O(N^2)…
Thanks.

Thanks, the i index was my mistake. But it doesn’t affect the logic of the code.

The full code is here.

Now, I’m using JSON.jl and it gets me a nested dictionary that I try to process manually to create a filtered DataFrame. Now, I’m looking for a better solution without using not standard libraries (JSON2.jl or LazyJSON.jl).

Where is the relation between the other thread? Sorry, I did not see it.

What does this mean? For a matrix, transposition swaps A[i,j] and A[j,i]. For a dictionary, it means…?

Do you mean swapping keys with values, i.e. going from d[k] == v to d′[v] == k, i.e. inverting the dictionary?

Or do you mean swapping d[i][j] with d[j][i] in a multilevel dictionary, similar to this Python package?

1 Like

Yes, like the Python package. But on the other hand typeof(data_regional["features"]) is Vector{Any}. I mean swapping data_regional[“features”][f][“properties”][“parameter”][“ALLSKY_SFC_SW_DWN”][t] to d’[t][f], where d’{String, Vector{Any}}().

Sounds like just:

d′ = Dict()
features = data_regional["features"]
for f in keys(features)
    tdict = features[f]["properties"]["parameter"]["ALLSKY_SFC_SW_DWN"]
    for t in keys(tdict)
        d′[t, f] = tdict[t]
    end
end

Instead of for f in keys(features) use for f in 1:length(features), because features is type Vector{Any} (vector of dictionaries).

[“ALLSKY_SFC_SW_DWN”] is Dict{String, Float32}.
String (key) - is a date in format “yyyy-mm-dd”.
Float32 - is the final value of this nested dictionary.

And instead of:

for t in keys(tdict)
        d′[t, f] = tdict[t]
end

I have:

for (t, value) in tdict
    if haskey(d′, t)
        push!(d′[t], value)     # copying
    else
        d′[t] = [value]            # creating Vector{Float32}
    end
end

This example uses d′ = Dict(), it’s not memory efficient.
This example uses 2 for loops and copying of every item in dict, it’s not efficient too.

It is like 2 glasses with different shapes and water. If you have water in the first one and you want to have a different shape you must take it to another glass with a different shape. And this operation is per water molecule … it is so hard…

It seems like what you want, by definition, is a copy of every element of dict? It’s just a question of what data structure you want to store the copy into.

I used d′[t, f], i.e. a tuple-keyed dictionary, because that’s what you said you wanted. Instead, it seems you want d′[t][f], i.e. a dictionary of vectors? (This has the same asymptotic complexity, but of course the constant factors and the memory layout are different.)

“Transposing” to d′[t][f], where d′[t] is a vector, is not possible unless your dictionary is non-sparse, i.e. if a given key t is defined for every index f (whereas d′[t, f] is more general). If that’s the case, your solution seems fine, although you could save a bit of time by preallocating the arrays with size length(features) rather than doing push! one element at a time.

Note that keys works for arrays too.

1 Like

Sorry, Yes I want d′[t][f].

For every f exists dict{String, Vector{Float32}}. I think too, it’s dense.

@threads for macro can help?

It must be a vector, because features is a vector, too. Or it may be dict, it will be better or without change?

You would have to be careful about race conditions. You definitely can’t use the push! solution with threads, but preallocating the d′[t] arrays and then writing into them in parallel (parallelizing over the for f in … loop) could work, I guess.

You’re unlikely to get much speedup using threads unless your dictionary is truly huge, though.

1 Like

Now, I’m using @threads for in another context and can I have @threads for in @threads for. Nested @threads for?

Few millions values.

Yes.

1 Like