I am taking in a simulation from another process written in existing software. It outputs a JSON package of a few tables (stored as array of objects). I am running an optimization in JuMP based on the simulation. The optimization runs in a few seconds but reading the JSON and converting into a DataFrame takes a long time. Specifically converting the large simulation into a DataFrame.
Iβm hoping that I am doing something inefficiently that can be improved.
function read_json(file)
open(file,"r") do f
global inDict
inDict = JSON.parse(f)
end
return inDict
end
inDict = read_json(file)
println("Creating Data Frames")
simstates = vcat(DataFrame.(inDict["simstates"])...)
After calling the function ran a benchmark on the read
julia> @benchmark inDict = read_json()
BenchmarkTools.Trial:
memory estimate: 1.79 GiB
allocs estimate: 29547484minimum time: 13.738 s (8.85% GC)
median time: 13.738 s (8.85% GC)
mean time: 13.738 s (8.85% GC)
maximum time: 13.738 s (8.85% GC)samples: 1
evals/sample: 1
Then a benchmark on the conversion to DataFrame
@benchmark simstates = vcat(DataFrame.(inDict[βsimstatesβ])β¦)
BenchmarkTools.Trial:
memory estimate: 3.95 GiB
allocs estimate: 65015493minimum time: 26.722 s (8.33% GC)
median time: 26.722 s (8.33% GC)
mean time: 26.722 s (8.33% GC)
maximum time: 26.722 s (8.33% GC)samples: 1
evals/sample: 1
So about 40 seconds to read in Simulation data. After that it takes about 5 seconds to do the optimization.
simstates is 765000Γ6 DataFrame
This is actually a small-ish simulation. I expect a production run to be a multiple of this.