100X tensor algebra speedup!?!

Found a very interesting article:
System for performing ‘tensor algebra’ offers 100-fold speedups over previous software packages

I am a newbie in linear algebra but this sounds like something that would be nice to implement in julia. Just curious to hear others’ thoughts on this.

2 Likes

Here’s a search that may get you started.

https://github.com/JuliaLang/julia/search?q=eachstored&type=Issues&utf8=✓

1 Like

Very interesting! Is there any difference in what is being implemented here:
https://github.com/JuliaLang/julia/issues/7010

…from what’s reportedly being done in the the article I mentioned? My knowledge is very limited but the only difference I can see is nonzero values being “compiled” vs. being iterated. Is that correct? Would there be a significant speed difference for compiled Vs. iterated?

Thanks for your thoughts.

Julia already compiles specialized implementations for its arguments. The only “trick” here is switching the algorithm from an O(all_elements) to O(nonzeros) implementation — that’s precisely what eachstored (or some such) could do. The 100x claim is meaningless — it all depends upon how many elements you’re working with since you’re changing the complexity.

1 Like

Yes, usually numbers like that tend to be (for several reasons) just generalizations in spite of being numbers. I was more interested in the question of whether there is anything in the idea that could have application in julia. Any speedup is good. But from appearances we have something similar.

Thanks for explaining it to me :grin: Understanding this may help me to write better code when I do work with sparse data.

Ah, glad to see taco getting a mention here!

The taco team is actually right down the hall from the Julia Lab at MIT. They’re a great bunch of engineers and, from what I’ve seen, taco is a really cool bit of tech. Folks in the Julia Lab are actively experimenting with integration/use of taco with/from Julia, and I’m personally quite excited to see what will come out of it.

14 Likes

Very cool! I should have clicked through the “amazing, switching algorithmic complexities results in huge speedups!” to see the actual work they’re doing. Can’t you get them to start writing it in Julia directly? :slight_smile:

3 Likes

Hi there! I’m Peter, a graduate student down the hall from the TACO team quietly working on integrating TACO into Julia. Indeed, TACO does a few things which we can’t do with Julia sparse matrix implementations. TACO would allow us to unify the sparse matrix CSR, CSC, DCSR, DCSC, and sparse vector implementations into a generalized representation of sparse arrays which extends to higher order tensors. (Also of course we can use TACO to write very cool optimized sparse tensor kernels)

It would be really fun to write TACO in Julia, but since I am only a mortal graduate student with a lot on my plate, the implementation of TACO will remain in C++ (so that the original TACO team can continue to maintain it). My plan for integrating TACO into Julia has two phases. Phase 1 is implementing an interface which calls TACO through CXX.jl and executes the C++ kernels. Phase 2 is extending TACO to use a Julia backend so that we can use the C++ tensor compiler to write Julia code which gets returned as the thunk of a generated function. Phase 2 will allow us to use custom datatypes and reductions in our generated kernels.

40 Likes

Hi! I’m Fred, one of the taco developers. Also glad taco is mentioned here! I love Julia, and am really excited by the work Peter is doing.

The 100x number is taken from our academic paper and is the speedup we got over the Matlab tensor toolbox for several kernels, which as far as we know is the only system that supports the full tensor algebra with sparse tensors. We also compare many kernels that someone has hand-optimized, e.g. in Intel MKl, and for those our performance is comparable to the hand-optimized versions.

Our edge is that we compile down each expression to a custom kernel optimized for that particular expression. We also do this for compound expressions with many sub-expressions. So for the expressions nobody has hand-coded yet, you can still get hand-coded performance using taco.

We think Julia and taco are a great fit. Programmers can develop with Julia, which is really nice, and taco can run in the background, transparently generating custom kernels to compute linear and tensor expressions.

18 Likes

Are you thinking about a lazy wrapper for arrays that just build up the taco format and lazily compute the result only when requested (by generating and dispatching on a resulting build-up type)?
Alternatively in the future you could exploit Cassette.jl functionality.

1 Like

I’m a little unsure about what you’re suggesting, but I think it will help if I clarify the design. I have created a sparse tensor datatype in Julia which mirrors the taco implementation. I have not settled on the user interface, but it will essentially be something which exposes index notation (similar to TensorOperations.jlTokamak.jl, ArrayMeta.jl, or others) and is able to pass the index expression to TACO. When it comes time to compute, the Julia tensor is converted to a C++ TACO tensor, the TACO-generated kernel is called on that tensor, and then the result is passed back to Julia. In Phase 2, we won’t need to pass the tensor to C++ because TACO will generate a Julia kernel.

8 Likes

I just watched @willow’ excellent presentation in JuliaCon: https://www.youtube.com/watch?v=Rp7sTl9oPNI

Peter, is this what “tortilla” does? In the talk you say that tortilla is still in the prototype phase. Can you elaborate? Do you expect to make it public soon?

This project has been an exciting journey for me since my last post. Since then, I have encountered a few roadblocks and finished a substantial amount of graduate school requirements. The scope of the project has grown to something beyond just a simple wrapper for TACO, and I’d like to discuss its capabilities after releasing a beta version. To give an answer to your question, Tortilla will include an interface for calling the (C++) TACO directly. I have made my next semester free to work on this software, and I intend to release a beta version by the semester’s end (December).

16 Likes

What do you think about “Mathematics of Arrays” as a solution to the
super generic array computation problem?

Here Peter talk about his project.

5 Likes

Thanks for the great read! For the most part, the abstractions that one uses to represent array computations are only as useful as they are convenient for users (as long as the interface describes enough important details about the computation). Index notation is one way to represent an array computation, and the mathematics of arrays that you describe is another. It looks like these array interfaces express similar operations, and that it is (mostly) possible to compile one to the other. I think we still need to figure out how to efficiently implement the operations described by these interfaces without forcing the user to make too many decisions about how the operations are implemented.

Sorry to revive this old topic. I wonder if there is any update on how to use Taco from Julia. I have an operation that can be best described as a sparse tensor operation and I think I could use something like Taco to generate an efficient kernel for the task.

1 Like

A native Julia library somewhat similar to taco is my Grassmann.jl library, which also supports sparse calculation and skips the computation of zero values, and also generates specialized code and caching for sub-algebras. It also has built in support for Hodge-DeRahm cohomology (in progress) and projective conformal geometric algebra. There is still lots of room for improvement of the sparse functionality yet. Currently, it is still under development and will have a lot of breaking changes coming up.

2 Likes

I am sorry the connection is not clear to me. How can I multiply 3 sparse matrices in a single kernel with Grassmann?

Grassmann.jl “kernelizes” conformal geometric algebra operations, it doesn’t use matrices but multivectors. Matrices are being replaced by multivectors in this paradigm, and tensor operations are “kernelized” in the flavor of multivectors instead of matrices.

If you were to use Grassmann.jl, you would need to change your understanding of what a matrix is, and learn what a multivector is.

1 Like