Hello,
I’d appreciate any thoughts or feedback on how I could implement the following.
I have a long-running simulation (several days) I run on my university’s HPC. Till now I have been saving the results at the end of each run in .jld2
files using JLD2.jl.
It works, but I am also still developing and troubleshooting certain aspects of the simulation code. Sometimes the changes I make result in the run timing out on the cluster or crashing at some intermediate stage. It would be nice if I could save intermediate snapshots of the results, say every 1000 iterations or so out of 10,000 total iterations. That would help since then I could see what the calculations were doing (inspect the results afterwards) and I could use that data to restart the run if needed.
In the current implementation my run script looks essentially like this (I am using DrWatson)
function main()
params = parse_commandline_args()
input_data = prepare_input_data(params)
result = solve(input_data, params)
safesave("path_to_my_results", result)
end
main()
What I would like to do is somehow open a stream to a file, pass that stream to solve
, and then inside solve
use that stream to write to a file. If solve
crashes or times out the stream cleans itself up and closes the file.
Maybe this is a stupid/impossible idea - I have no idea. Hoping others might have some insight. If possible, I’d like to avoid adding DrWatson as a dependency to my package code (which is where solve
is defined).