I need to transpose a dictionary, but my idea is to use the temporary dictionary X and copy the elements in a transposed manner. Exist something for the critical performance section where the transposition of a dictionary is needed. Here is an example of actual code:
using DataFrames
using Dates
df_regional_daily = DataFrame(
DateTime = Date[],
MonthSin = Float32[],
MonthCos = Float32[],
DaySin = Float32[],
DayCos = Float32[],
Name = String[],
Latitude = Float32[],
Longitude = Float32[]
)
for j in 1:16
df_regional_daily[!, "Value$(j)"] = Float32[]
end
type or paste code here
X = Dict{String, Array{Float32}}() # temporary
for f in 1:length(data_regional["features"])
irradiance = data_regional["features"][f]["properties"]["parameter"]["ALLSKY_SFC_SW_DWN"]
for (t, value) in irradiance
if haskey(X, t)
push!(X[t], value) # copying
else
X[t] = [value] # creating
end
end
end
for (t, value) in X
t = Date(t, "yyyymmdd")
push!(df_regional_daily, [
t,
sinpi(month(t) / MONTH_PERIOD * 2),
cospi(month(t) / MONTH_PERIOD * 2),
sinpi(dayofyear(t) / DAY_PERIOD * 2),
cospi(dayofyear(t) / DAY_PERIOD * 2),
location_name,
location[1],
location[2],
value...
])
end
In the worst case, this code is not memory and computation effective and works with O(N^2)…
Thanks.
Now, I’m using JSON.jl and it gets me a nested dictionary that I try to process manually to create a filtered DataFrame. Now, I’m looking for a better solution without using not standard libraries (JSON2.jl or LazyJSON.jl).
Where is the relation between the other thread? Sorry, I did not see it.
Yes, like the Python package. But on the other hand typeof(data_regional["features"]) is Vector{Any}. I mean swapping data_regional[“features”][f][“properties”][“parameter”][“ALLSKY_SFC_SW_DWN”][t] to d’[t][f], where d’{String, Vector{Any}}().
d′ = Dict()
features = data_regional["features"]
for f in keys(features)
tdict = features[f]["properties"]["parameter"]["ALLSKY_SFC_SW_DWN"]
for t in keys(tdict)
d′[t, f] = tdict[t]
end
end
Instead of for f in keys(features) use for f in 1:length(features), because features is type Vector{Any} (vector of dictionaries).
[“ALLSKY_SFC_SW_DWN”] is Dict{String, Float32}.
String (key) - is a date in format “yyyy-mm-dd”.
Float32 - is the final value of this nested dictionary.
And instead of:
for t in keys(tdict)
d′[t, f] = tdict[t]
end
I have:
for (t, value) in tdict
if haskey(d′, t)
push!(d′[t], value) # copying
else
d′[t] = [value] # creating Vector{Float32}
end
end
This example uses d′ = Dict(), it’s not memory efficient.
This example uses 2 for loops and copying of every item in dict, it’s not efficient too.
It is like 2 glasses with different shapes and water. If you have water in the first one and you want to have a different shape you must take it to another glass with a different shape. And this operation is per water molecule … it is so hard…
It seems like what you want, by definition, is a copy of every element of dict? It’s just a question of what data structure you want to store the copy into.
I used d′[t, f], i.e. a tuple-keyed dictionary, because that’s what you said you wanted. Instead, it seems you want d′[t][f], i.e. a dictionary of vectors? (This has the same asymptotic complexity, but of course the constant factors and the memory layout are different.)
“Transposing” to d′[t][f], where d′[t] is a vector, is not possible unless your dictionary is non-sparse, i.e. if a given key t is defined for every index f (whereas d′[t, f] is more general). If that’s the case, your solution seems fine, although you could save a bit of time by preallocating the arrays with size length(features) rather than doing push! one element at a time.
You would have to be careful about race conditions. You definitely can’t use the push! solution with threads, but preallocating the d′[t] arrays and then writing into them in parallel (parallelizing over the for f in … loop) could work, I guess.
You’re unlikely to get much speedup using threads unless your dictionary is truly huge, though.