Flux upgrade to v0.10 seems buggy

I have Flux code which worked fine until I (just) upgraded to the latest version, where the same code no longer works with the ‘upgraded’ version.

I understand there has been innovation using Sparse Arrays, but that seems buggy, as now I am getting the error:

Need an adjoint for constructor SparseMatrixCSC{Float64,Int64}. Gradient is of type Adjoint{Float64,Array{Float64,2}}

Please note I am specifying my own loss function, but it is not complicated.

Thanks for any help!!

PS In another post, I report another change: BSON model saving/retrieving has broken, and now also I can’t run any of my previously saved models. I understand there has been innovation moving away from Tracker, but the new version of Flux does not seem to be backwards-compatible!

Alternatively, could someone please tell me how to downgrade Flux to the previous version (9 I think)?

That is true

You can either do add Flux@0.9 or set

[compat]
Flux = "0.9"

to your Project file and then run resolve (in the Pkg REPL mode).

2 Likes

Thanks for that. I will re-load.

******UPDATE I ran Pkg.add(“Flux@0.9”) and got the error

Flux@0.9 is not a valid packagename.

Also, I do think the error message I quoted above represents a bug in the current version, not just a compatibility thing, as my script is not doing anything out of the ordinary. Of course, I will run version .9 (thanks) because it works, but I do wonder if the current one might be fixed?
One idea is that a ’ was accidentally used for Transpose somewhere in the code, inadvertently invoking Adjoint, where this operation might not be defined for matrix type SparseMatrixCSC? [Just a thought]

I forget the syntax for Pkg.add(…, but press ] to get a prompt pkg>, and then type add Flux@0.9.

Am pretty sure that the adjoint mentioned in the error is in the sense of Zygote.@adjoint, something like a gradient, not in the sense of conjugate-transpose.

Hi Improb, Thanks, but for various reasons I am constrained to Jupyter at the moment, and ‘]’ doesn’t work.

Also, is there anything you can think of that I could do that would address the adjoint issue?

ok! Looks like you need PackageSpec with version=, according to the help:

(v1.3) pkg> ?add
  add pkg[=uuid] [@version] [#rev] ...

  Add package pkg to the current project file. If pkg could refer to multiple different packages,
  specifying uuid allows you to disambiguate. @version optionally allows specifying which versions
  of packages to add. Version specifications are of the form @1, @1.2 or @1.2.3, allowing any
  version with a prefix that matches, or ranges thereof, such as @1.2-3.4.5. A git revision can be
  specified by #branch or #commit.

  If a local path is used as an argument to add, the path needs to be a git repository. The
  project will then track that git repository just like it would track a remote repository online.

  Examples

  pkg> add Example
  pkg> add Example@0.5
  pkg> add Example#master
  pkg> add Example#c37b675
  pkg> add https://github.com/JuliaLang/Example.jl#master
  pkg> add git@github.com:JuliaLang/Example.jl.git
  pkg> add Example=7876af07-990d-54b4-ab0e-23690620f79a

julia> using Pkg

help?> Pkg.add
  Pkg.add(pkg::Union{String, Vector{String}})
  Pkg.add(pkg::Union{PackageSpec, Vector{PackageSpec}})

  Add a package to the current project. This package will be available by using the import and
  using keywords in the Julia REPL, and if the current project is a package, also inside that
  package.

  Examples
  ≡≡≡≡≡≡≡≡≡≡

  Pkg.add("Example") # Add a package from registry
  Pkg.add(PackageSpec(name="Example", version="0.3")) # Specify version; latest release in the 0.3 series
  Pkg.add(PackageSpec(name="Example", version="0.3.1")) # Specify version; exact release
  Pkg.add(PackageSpec(url="https://github.com/JuliaLang/Example.jl", rev="master")) # From url to remote gitrepo
  Pkg.add(PackageSpec(url="/remote/mycompany/juliapackages/OurPackage"))` # From path to local gitrepo

  See also PackageSpec.

Re the constructor SparseMatrixCSC error, I don’t know, but if you can isolate what’s causing it, then I’m sure it would be worth opening an issue.

Thanks. That worked in terms of loading the earlier version, but the error is still there, which is weird because I haven’t changed the script [I will have to double-check this] from what it was before upgrading to 10.1, and it worked before(!)

Might have to try the other Neural Network package.

It does actually work if you just type ] followed directly by the package command, eg

] add Flux

So you concede that the new version of Flux is not backwards compatible?

That means months of struggling to learn Flux have been wasted because the code I finally (just about) got going no longer works with the new version?? Is that how Flux expects to get more users(?!!) Is there anywhere that explains how programming Flux has changed with the new version? I tried my script side-by-side on another of my machines (with the previous version of Flux) and it ran just fine, but with the new version it falls over in a way that I wouldn’t have a clue how to fix.
The same with saving models. Is there to be no enduring way to save a model?
Really!??

So you concede that the new version of Flux is not backwards compatible?

Correct, it’s not secret.
It isn’t backwards compatible.
Otherwise it would be 0.9.1.
That’s how Julia style SemVer version numbers work.

That means months of struggling to learn Flux have been wasted because the code I finally (just about) got going no longer works with the new version??

Not wasted no.
Almost all the skills you have learned will still carry over.
Almost the whole user facing interface is the same.
A few changes to functions like testmode.
You can find the full-list of changes in

Is that how Flux expects to get more users(?!!)

No? Why would you think that Flux expects to get more users from a breaking change? That’s an odd expectation.

In the longer term these changes were nessecary to continue as advancing the framework to live up to its full potential and to allow users to continue to push out on the frontiers of differentiable programming, ML (especially Scientific ML).
It’s much more flexible now than before.
These changes for example have made DiffEqFlux.jl basically just work – the slightly hacky boilerplate is completely gone.

Is there anywhere that explains how programming Flux has changed with the new version?

The list I posted above in the release notes is a good place to look.
But really not much user facing has changed.
The big thrust of this release was to switch to a new AD package: Zygote.
Which is what gives the increase in flexibility and power.

However, as always with big internal overhauls like this there is a chance of bugs. (Really an inevitability in a change this big).
You have run into one such bug.

I tried my script side-by-side on another of my machines (with the previous version of Flux) and it ran just fine,

Correct, this is as expected. It should work on a machine running an older version of Flux.
That’s what using versions does.
When a new version breaks something one has the ability to go back to the old version.
The fantastic innovation of versioned releases.
It’s really useful, right?

but with the new version it falls over in a way that I wouldn’t have a clue how to fix.

Not a problem, just ask for help.
Either here, or on the Julia Slack, or on StackOverflow.
In this case, I can help you now.
You have ran into a bug.
You should open an issue at https://github.com/FluxML/Zygote.jl/issues
You should pin your version of Flux to v0.9 until it is fixed.

The same with saving models. Is there to be no enduring way to save a model?
Really!??

In the case of this release,
It is likely possible to loaded up models saved on the older version.
This of course isn’t always going to be possible, since nonbackawards compatible changes are not backwards compatible.
But in this case it should be.
Especially if you save it using the weights only procedure, that removes tracking.
Key change in Flux 0.10 was to remove the need for tracking.

(That’s data.params(model))
https://fluxml.ai/Flux.jl/stable/saving/#Saving-Model-Weights-1

Best is to load up you model in Flux 0.9,
Then save the weights without tracking.
Then you should be able to load them up in Flux 0.10 per the readme (best to wait til your bug is fixed first).

4 Likes

Thanks so much, especially for your patience with my frustration.

When I added the earlier version of Flux before, it didn’t fix the problem.

Do I need to refer to the specific version when I invoke ‘using’?
[considering both packages are on my system]

Thanks again.

Just guessing here but did you exit Julia (restart the JuPyTer kernel) in between?
If you already ran using Flux the version that was installed before will stay loaded.

If you did restart.
Can you run

using Pkg
Pkg.pkg"st"

And post the results?

(Pkg.pkg"Foo the bar" is equiv to ] Foo the bar in the REPL)

[considering both packages are on my system]

Only one can be in your current environment. (Which I am guessing is the default environment)
The above command will tell us which.
st is for status

Thanks. I didn’t restart. I will this time.

**UPDATE

Yeah, that worked - it runs again now (thanks!)

1 Like

For interest, and not something you should action:

adjoint in this context means “Method to propagate the derivative backwards” (in this case through the constructor).
Its the pullback or vector transpose jacobian product or v’jp (sometimes written vjp and called vector jacobian product. Or j’vp Jacobian transpose vector product)

Its actually closely related to the adjoint of a matrix.
Lets see if i can describe this right:

  • If one linearizes the computer program at a point, then the program becomes a linear map,
    • linearlizing the program means findind program that approximates the real programs output using tangent plane
    • estimates how much a change in the input would change the outpuit
    • forward mode AD does this linearizing op
  • linear maps can also be represented as matrixes
    • that matrix would be the jacobian.
    • so linearlized functions are the function form of the jacobian matrix.
  • Now if one takes a transpose or adjoint of the jacobian what does that do?
    • (there is a ever present debate as to if one should be taking the conjugate or not. Complex number AD is a contentious subfield.)
    • it would take in a change in output and (under a linear approximation) estimate what change in input would cause it.
    • This is what reverse mode AD does too
  • Thus link forward mode AD gives a function form of the Jacobain. reverse mode gives its transpose (or adjoint)
    • thus reverse mode is findin the (linear) adjoint program.

And flux is complaining that for the operation: constructor SparseMatrixCSC{Float64,Int64}.
zygote doesn’t know how to find the adjoint program.

basically Zygote has trouble autodiffing through constructors and needs to be told explictly how to do it. (I’ld like to look into why that is, I don’t myself know I haven’t looked at that part of the source)
It needs a custom rule written, which it has many custom rules already but that one I guess was missed.
The old AD system tracker also had trouble with this but for very different reasons, and it already had the fix in place. (and the fix looked very different because it came from different reasons. So I see why it was missed)

That particular one is odd though,
Because the

2 Likes

Thanks for that. Just to let you know, in this case, I am writing my own prediction function for the output, but it is simple - just a softmax, though across rows rather than within (as in ‘onehot’), otherwise quite standard, though.

[just to clarify above… my case is single output probability, where there is only one ‘1’ in the first 10 rows of outcome, say, and there is a softmax over these rows, etc.]

Is there a way I could specify the gradient (and hessian) that would help?

Thanks very much for having a look. Unfortunately, that takes me into a territory which is totally unfamiliar (zygotes, and adjoints which are not even the traditional mathematical ones?) and where I don’t even want to go. I have no idea what your example of code does, for instance. When I decided to invest the time to get into Flux, it was because it seemed to be elegant, high-level, and user-friendly, where things like gradients just worked (as in the previous version, before upgrade). It was a narrow choice over PyTorch, but seemed the winner. But now it appears that Flux is moving in a different direction, away from user-friendliness, towards a small population of very advanced, expert (and patient) users, which I think is a shame. Could not a ‘Tracker mode’ be maintained for backwards compatibility, and for less advanced users? Or perhaps the ‘Zygote’ approach be made to be as flexible as the Tracker mode was?

You asked to do an advanced thing, so the answer was advanced.

That is why I didn’t post telling you to do this in the original comment I made.
Because you are not expected to do this.
But you asked, so I told you.

A custom adjoint is how you tell Zygote how to calculate a gradient.
It is the instruction on how to apply the chain rule.
It aligned with the mathematics per this explaination.

These are bugs. The intent is that they will be fixed.
And that Zygote will be just as easy (easier infact) than Tracker.

2 Likes

Hi Ox, Thanks for that. I was hoping that if I simply specified a derivative, that might help, but I guess not.

In any case, I would be interested in any news as far as to whether or not they have isolated the aforementioned bugs. Do they need my help in recreating something, for instance?