On a different machine I tried with julia 1.5.2 and got very good runtime values with 3 residual blocks:
julia> include("/home/ulg/gher/abarth/Julia/share/test_zygote_perf.jl")
# forward pass (1st and 2nd call)
19.787238 seconds (51.32 M allocations: 2.547 GiB, 4.63% gc time)
0.492617 seconds (439.07 k allocations: 21.413 MiB)
# backward pass (1st and 2nd call)
28.111849 seconds (51.57 M allocations: 2.611 GiB, 3.74% gc time)
0.069456 seconds (50.17 k allocations: 2.577 MiB)
That would be a factor of ~50 000 (between julia 1.7.0-rc1/1.6.1 and julia 1.5.2) for the 2nd call of the backward pass!
(Flux-0.12) pkg> st --manifest
Status `/home/users/a/b/abarth/.julia/environments/Flux-0.12/Manifest.toml`
[621f4979] AbstractFFTs v1.0.1
[1520ce14] AbstractTrees v0.3.4
[79e6a3ab] Adapt v3.3.1
[56f22d72] Artifacts v1.3.0
[ab4f0b2a] BFloat16s v0.1.0
[fa961155] CEnum v0.4.1
[052768ef] CUDA v2.4.3
[082447d4] ChainRules v0.7.70
[d360d2e6] ChainRulesCore v0.9.45
[944b1d66] CodecZlib v0.7.0
[3da002f7] ColorTypes v0.11.0
[5ae59095] Colors v0.12.8
[bbf7d656] CommonSubexpressions v0.3.0
[34da2185] Compat v3.37.0
[e66e0078] CompilerSupportLibraries_jll v0.3.4+0
[9a962f9c] DataAPI v1.9.0
[864edb3b] DataStructures v0.18.10
[163ba53b] DiffResults v1.0.3
[b552c78f] DiffRules v1.3.1
[ffbed154] DocStringExtensions v0.8.5
[e2ba6199] ExprTools v0.1.6
[1a297f60] FillArrays v0.11.9
[53c48c17] FixedPointNumbers v0.8.4
[587475ba] Flux v0.12.1
[f6369f11] ForwardDiff v0.10.19
[d9f16b24] Functors v0.2.5
[0c68f7d7] GPUArrays v6.4.1
[61eb1bfa] GPUCompiler v0.8.3
[7869d1d1] IRTools v0.4.3
[92d709cd] IrrationalConstants v0.1.0
[692b3bcd] JLLWrappers v1.3.0
[e5e0dc1b] Juno v0.8.4
[929cbde3] LLVM v3.9.0
[2ab3a3ac] LogExpFunctions v0.3.0
[1914dd2f] MacroTools v0.5.8
[e89f7d12] Media v0.5.0
[e1d29d7a] Missings v1.0.2
[872c559c] NNlib v0.7.19
[77ba4419] NaNMath v0.3.5
[05823500] OpenLibm_jll v0.7.1+0
[efe28fd5] OpenSpecFun_jll v0.5.3+4
[bac558e1] OrderedCollections v1.4.1
[21216c6a] Preferences v1.2.2
[189a3867] Reexport v1.2.2
[ae029012] Requires v1.1.3
[6c6a2e73] Scratch v1.1.0
[a2af1166] SortingAlgorithms v1.0.1
[276daf66] SpecialFunctions v1.6.2
[90137ffa] StaticArrays v1.2.12
[82ae8749] StatsAPI v1.0.0
[2913bbd2] StatsBase v0.33.10
[fa267f1f] TOML v1.0.3
[a759f4b9] TimerOutputs v0.5.12
[3bb67fe8] TranscodingStreams v0.9.6
[a5390f91] ZipFile v0.9.4
[83775a58] Zlib_jll v1.2.11+18
[e88e6eb3] Zygote v0.6.12
[700de1a5] ZygoteRules v0.2.1
[2a0f44e3] Base64
[ade2ca70] Dates
[8bb1440f] DelimitedFiles
[8ba89e20] Distributed
[b77e0a4c] InteractiveUtils
[76f85450] LibGit2
[8f399da3] Libdl
[37e2e46d] LinearAlgebra
[56ddb016] Logging
[d6f4376e] Markdown
[a63ad114] Mmap
[44cfe95a] Pkg
[de0858da] Printf
[9abbd945] Profile
[3fa0cd96] REPL
[9a3f8284] Random
[ea8e919c] SHA
[9e88b42a] Serialization
[1a1011a3] SharedArrays
[6462fe0b] Sockets
[2f01184e] SparseArrays
[10745b16] Statistics
[8dfed614] Test
[cf7118a7] UUIDs
[4ec0a83e] Unicode
julia> versioninfo()
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake-avx512)
Environment:
JULIA_REVISE_POLL = 1