I am taking in a simulation from another process written in existing software. It outputs a JSON package of a few tables (stored as array of objects). I am running an optimization in JuMP based on the simulation. The optimization runs in a few seconds but reading the JSON and converting into a DataFrame takes a long time. Specifically converting the large simulation into a DataFrame.
I’m hoping that I am doing something inefficiently that can be improved.
function read_json(file) open(file,"r") do f global inDict inDict = JSON.parse(f) end return inDict end inDict = read_json(file) println("Creating Data Frames") simstates = vcat(DataFrame.(inDict["simstates"])...)
After calling the function ran a benchmark on the read
julia> @benchmark inDict = read_json()
memory estimate: 1.79 GiB
allocs estimate: 29547484
minimum time: 13.738 s (8.85% GC)
median time: 13.738 s (8.85% GC)
mean time: 13.738 s (8.85% GC)
maximum time: 13.738 s (8.85% GC)
Then a benchmark on the conversion to DataFrame
@benchmark simstates = vcat(DataFrame.(inDict[“simstates”])…)
memory estimate: 3.95 GiB
allocs estimate: 65015493
minimum time: 26.722 s (8.33% GC)
median time: 26.722 s (8.33% GC)
mean time: 26.722 s (8.33% GC)
maximum time: 26.722 s (8.33% GC)
So about 40 seconds to read in Simulation data. After that it takes about 5 seconds to do the optimization.
This is actually a small-ish simulation. I expect a production run to be a multiple of this.