Possible to save compiled GradientTape after compile in ReverseDiff

Hi,

I was wondering if it would be possible to save the compiled GradientTape that is used in ReverseDiff.jl.

Using the gradient example from the package, I’d like to save

compiled_f_tape


# some objective function to work with
f(a, b) = sum(a' * b + a * b')

# pre-record a GradientTape for `f` using inputs of shape 100x100 with Float64 elements
const f_tape = GradientTape(f, (rand(100, 100), rand(100, 100)))

# compile `f_tape` into a more optimized representation
const compiled_f_tape = compile(f_tape)

Thank you for your help!

What do you mean by “save”? That when you start a new Julia session you can load it from a file and not have to recompute?

Hi odow,

Yes, that’s exactly what I meant. Thanks!

You can do this via Serialization.jl:

julia> import ReverseDiff

julia> import Serialization

julia> f(a, b) = sum(a' * b + a * b')
f (generic function with 1 method)

julia> f_tape = ReverseDiff.GradientTape(f, (rand(100, 100), rand(100, 100)))
typename(ReverseDiff.GradientTape)(f)

julia> compiled_f_tape = ReverseDiff.compile(f_tape)
typename(ReverseDiff.CompiledTape)(f)

julia> Serialization.serialize("my_tape", compiled_f_tape)

julia> a, b = rand(100, 100), rand(100, 100);

julia> inputs = (a, b);

julia> results = (similar(a), similar(b));

julia> ReverseDiff.gradient!(results, compiled_f_tape, inputs)
([96.80321764574695 95.9322689982503 … 93.98743219966727 96.87930289033304; 102.27639868310555 101.40545003560888 … 99.46061323702587 102.35248392769164; … ; 103.1272722419636 102.25632359446693 … 100.31148679588392 103.20335748654969; 100.43875314510339 99.56780449760672 … 97.6229676990237 100.51483838968947], [102.13773483040966 106.64872883076143 … 99.98463180500985 103.49920032293873; 102.67477116869888 107.18576516905065 … 100.52166814329908 104.03623666122796; … ; 104.45166188878761 108.9626558891394 … 102.29855886338783 105.81312738131669; 99.2233011486461 103.73429514899789 … 97.07019812324631 100.58476664117518])

julia> exit()
(base) oscar@Oscars-MBP /tmp % julia --project=jmp
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.7 (2022-07-19)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> import ReverseDiff

julia> import Serialization

julia> f(a, b) = sum(a' * b + a * b')
f (generic function with 1 method)

julia> compiled_f_tape = Serialization.deserialize("my_tape")
typename(ReverseDiff.CompiledTape)(f)

julia> a, b = rand(100, 100), rand(100, 100);

julia> inputs = (a, b);

julia> results = (similar(a), similar(b));

julia> ReverseDiff.gradient!(results, compiled_f_tape, inputs)
([97.33637076539904 102.28140685295034 … 102.70766708523875 99.48674356695675; 97.34029413495566 102.28533022250696 … 102.71159045479537 99.49066693651336; … ; 97.93156767732057 102.87660376487189 … 103.3028639971603 100.0819404788783; 90.74360862299932 95.68864471055062 … 96.11490494283903 92.89398142455703], [98.27267578994649 104.14102267355793 … 103.90332569214415 102.6386153051418; 100.12223668702103 105.99058357063247 … 105.75288658921869 104.48817620221632; … ; 91.52719382178103 97.39554070539248 … 97.15784372397869 95.89313333697633; 98.46380577270513 104.33215265631657 … 104.09445567490279 102.82974528790044])

But just because you can, doesn’t mean you should. The serialization format is not stable across Julia versions, so it should not be relied upon as a file format. And there’s some overhead to reading and writing to file.

The better question to ask is: what are you trying to achieve? Why is it necessary to save to compiled tape? Why not just re-generate it in a new session? Julia is best suited to long-running sessions, not short one-off sessions.

Hi odow,

Thank you very much for the suggestions!

In my particular case, I was trying to use optimize the problem (using the Optim.jl package with the LBFGS method) with different starting points. The tape and compile parts take about 10 hours in my particular problem. I thought saving the compiled tape beforehand might be a good idea so that I could re-use it, but I’m not sure if this is the best way to approach the question. Would you mind sharing some advice on that?

This suggests that ReverseDiff might not be the best tool for the job.

How big is the problem you are trying to optimize?

The number of parameters is around 50,000, and the size of the data is about 20M.

I would take the time to make a minimal working example of what you’re trying to optimize. Don’t give the full 50,000 parameters, and don’t give the full data. Ideally make something parameterized with random data that I can change an input number and scale to different sizes.

We’re more interested in the functional form. Are there any constraints?

Got it; I will try that and post it once I have a minimal working example. The problem doesn’t have constraints.