Hi @tomerarnon,
thanks for the prompt clarification!
As I mentioned in the question, the issue is relative to #18877 where in the proposed solution the parameters are “recognized” only in the layers D
and E
, and not in A*
, B*
, and C*
. From their discussion it seems that this wasn’t the intended behavior, since probably the other layers should also be trainable.