Could CUDA.jl be better at explaining what went wrong

Wouldn’t it be possible to include a bit more information in the CUDA error messages?

It is often hard to locate the source of an error that manifests itself in the CUDA code and I would think all clues would be welcome. This notwithstanding the error messages in CUDA don’t include easily available information.

I am not overly familiar with CUDA programming (although I’ve given it a stab) and even less so with its inner workings, but when I glanced at some code it struck me that the error message I was getting could be more informative and made the following patch to my CUDA.jl/src/array.jl:

--- array.jl.original	2023-01-24 18:51:33.302606000 +0100
+++ array.jl.modified	2023-01-24 19:03:31.611944000 +0100
@@ -31,7 +31,7 @@
   dims::Dims{N}
 
   function CuArray{T,N,B}(::UndefInitializer, dims::Dims{N}) where {T,N,B}
-    Base.allocatedinline(T) || error("CuArray only supports element types that are stored inline")
+    Base.allocatedinline(T) || error("CuArray only supports element types that are stored inline (tried to allocate a $dims block of $T)")
     maxsize = prod(dims) * sizeof(T)
     bufsize = if Base.isbitsunion(T)
       # type tag array past the data
@@ -47,7 +47,7 @@
 
   function CuArray{T,N}(storage::ArrayStorage{B}, dims::Dims{N};
                         maxsize::Int=prod(dims) * sizeof(T), offset::Int=0) where {T,N,B}
-    Base.allocatedinline(T) || error("CuArray only supports element types that are stored inline")
+    Base.allocatedinline(T) || error("CuArray only supports element types that are stored inline (tried to allocate a $dims block of $T)")
     return new{T,N,B}(storage, maxsize, offset, dims)
   end
 end

I am sure there are other places that similar changes could be made.

Probably most Julia packages could have better error messages. A lot of Julia devs work on many packages and/or base and have very little spare time. Docs and good errors, while important, can often be put second to writing features and fixing bugs.

Well-written PRs improving errors like you are here would probably be appreciated pretty much anywhere.

It may be a good idea to benchmark all your error message fixes, because string interpolation code can slow down hot paths some cases.

2 Likes

Is time efficiency very important in an error?

Normally a process dies so it would only be called once.

Even if a process catches the error it shouldn’t be something that happens often in an otherwise working program (and there is little value in fast non-working programs).

One could of course design ones programs to use exceptions for all sorts of execution control, and in this case it would perhaps be good to optimise.

Leaving this aside, what, in practice, should I as a newbe do with these things?

Should I go to the github pages for the package and put a patch in a suggestion box? (This sounds a bit sarky when I read it, but I just want to know the correct procedure without any ironing intended.)

2 Likes

I would create a pull request.

  1. Fork Cuda.jl on Github
  2. Create a git branch
  3. Apply your patch
  4. Push your branch to your Github fork
  5. Create a pull request: Compare · JuliaGPU/CUDA.jl · GitHub
  6. Explain why you think this would be helpful.

This is how open source development works.

5 Likes

The problem is the same function running without any error still contains string interpolation code, which can be expensive in hot paths for a number of reasons. The function might otherwise be just a few lines of assembly.

Its not hugely important, but if you are about to embark on an error improving crusade you will want to know that it’s not overhead free in hot paths, and tricks like using @noinline _some_error() = error functions moved outside the main function is quite common. See : Make Julia’s Error Codes Even Better Than Elm’s - #13 by StefanKarpinski and the discussion below.

And yes as @mkitti says, fork the repo on github, make an appropriately named branch, push your changes, and make a pull request to the original repo with a description of your reasoning.

OK, I see the problem now. There seems to be a technique to use around it, though. I assume that generating parts of the error message in another function removes the performance hit when this function is never called, right?

I suspected that my patch was a bit simplistic, so I mainly wanted to bring to attention the desirability of better error messages. CUDA.jl is not the only package that suffers from this, but now I know why this might be.

It should also be mentioned that while the discussion referenced said that it was perhaps a bit harder to generate informative error messages in Julia without severe performance hits, neither what Stefan Karpinski stated there nor what I have since seen in code, indicate that it’s impossible.

Yeah, that’s it. And yes its totally possible to work around. Calling another separate function that doesn’t inline (with the @noinline anotation) with the string interpolation in it is a pretty easy fix.

This error specifically has already been improved recently, Provide more useful explanation why an eltype is unsupported. by maleadt · Pull Request #1596 · JuliaGPU/CUDA.jl · GitHub, but that’s just not part of a released version of CUDA.jl yet.

I’ve looked at it. I still don’t fully understand quite what is bad for a function and what isn’t. In the present solution it seems that check_eltype is called for every allocation and it has string interpolation. Is this not a problem?

Is there a way of not incurring high penalties for code that works but propagate the shape (the dims variable in this case) to the error message? My own patch lead me on the right track (to solve one problem, unfortunately not the only one) because I new where in the code that tensors with certain dimensions were manipulated.