Generalizing bounds checking

question

#1

I’ve found myself on several occasions wanting a safe-but-slow version as well as an unsafe-but-fast version of a function. @boundscheck and @inbounds solve this problem marvelously for array indexing, but that’s not exactly what I need this for.

Here are two example scenarios, along with the current, hacky solutions from RigidBodyDynamics.jl:

  • Frame checks. A common source of errors in kinematics/dynamics code is coordinate frame mismatches. I try to eliminate this class of errors by checking that frames match before performing operations. I do this using an @framecheck macro (example usage). To elide the frame checks when maximal performance is desired, I abuse the @boundscheck machinery in @framecheck, as was rightfully pointed out during Juliacon: running Julia with --check-bounds=no will also elide the frame checks.
  • Cache checks. To prevent doing double work, I store intermediate results in CacheElements, which store data along with a dirty flag. Ideally, the value of the dirty flag should always be checked before accessing the data stored in a CacheElement, but that has too much overhead in general, so I need a user-friendly way to elide those checks when it is known that the cache is up to date. I hacked together an @nocachecheck macro (example usage) that replaces calls to a short list of functions with calls to unsafe methods of those functions.

The first of these solutions is of course undesirable because it conflates two different types of checks. The second is undesirable because it is not possible to differentiate between calls to different methods of the same function (except by the number of arguments) in a macro.

Unfortunately, exactly replicating the @boundscheck / @inbounds machinery requires hooks into the compiler.

So, I’m wondering:

  1. is there a better way to replicate the bounds checking behavior without hooks into the compiler?
  2. if the answer to the first question is no, would it be feasible and desirable to generalize the bounds checking hooks in the compiler to make it easy to set up new sets of macros similar to @inbounds + @boundscheck (and possibly @propagate_inbounds), but for a different purpose and with separate on/off switches?

@inbounds like mechanism?
#2

It seems like the pushmeta!/popmeta! functions can almost provide the kind of annotation that would be needed, but unfortunately pushmeta! can only add a :meta expression to a function body; I don’t think it would then be possible to have an @inbounds-style macro process an expression containing calls to the annotated function based on those :meta tags, since it would require inference to figure out which method of a function is being called in the expression so that the body can be accessed. I guess this is related to the ‘one level of inlining’ rule for bounds checks. Related: https://github.com/JuliaLang/julia/pull/8779#issuecomment-60379707.


#3

Hello tkoolen!

This is probably far from general. As a basic user, for the cache problem I would try to pass a boolean “value type” as in the docs to every function that is supposed to use the cache and then an if statement inside every function can decide to skip the cache check if the valuetype say so.
You can then easily propagate a variable with a value type around if needed.
This is explicit and efficient, multiple dispatch together the if statement will compile away the cache check. Have you already tried this?

Nice advancements with RigidBodyDynamics.jl!


#4

Thanks!

The Val trick is close to, but probably slightly better (in terms of readability) than the current low-level cache accessor method setup; I’ll probably switch to using that.

However, the main problem with the cache example is usability: users of the package will still have to pass in this ugly extra Val{true} argument everywhere, which is easy to forget and certainly not as nice as annotating a whole block of code with @inbounds. So I’m really looking for a replacement for @nocachecheck in this case.


#5

In the meanwhile this is how to do it without the extra Val{true} by chosing to have it as default:

struct CheckCache{CC} end # A custom Value Type

function fun(::Type{CheckCache{CC}}=CheckCache{true}) where CC
  if CC!=false
    println("update cache")
  end
  println("cache is updated, proceed ")
end

fun()
fun(CheckCache{false})

PS: It would be nice to be able to write where CF::Bool for having the constraint on the parameter.

However we are still searching for something better and equivalent to @inbounds for @nocachecheck.


#6

I’ve also wondered what a generalized @boundscheck mechanism would look like. However, have you tested just doing the easy thing:

const framecheck = Ref(true)
macro framecheck(f1, f2)
    return :( framecheck[] && $(esc(f1)) != $(esc(f2)) &&
        framecheck_fail($(QuoteNode(f1)), $(QuoteNode(f2)), $(esc(f1)), $(esc(f2))) )
end
(*)(t::Transform3D, point::Point3D) = begin @framecheck(t.from, point.frame); Point3D(t.to, rotation(t) * point.v + translation(t)) end

This is a bit more flexible and reliable, and the processor should be able to predict this branch pretty much perfectly, so the runtime cost should be extremely low.

The @inbounds functionality is a bit different typically, since the bounds-check only adds ~10-20% overhead for the tiniest loops (hardly ever even worth turning it off – it’s pretty much always a correctly predicted branch), but it also prevents vectorization (which occasionally gives a massive, order of magnitude, speed up – in which case it’s absolutely essential that it could be turned off by the caller)


#7

Thank you. That works better than I expected. I was only thinking about solutions that completely elide the frame checking code, but maybe this is OK. Differences in some of my benchmark results look to be within the noise, with some slightly faster and some slightly slower than the old approach. Some other benchmarks seem to consistently be a significant bit slower though (~10%). I’ll keep evaluating as I try to get closer to the fastest C++ implementations and will decide after that.

I suppose this wouldn’t work well for the cache check example however, as the value in the corresponding global Ref would need to change pretty often (microsecond scale or faster), which I’d assume would mess with branch prediction. It also wouldn’t be thread safe. So any suggestions for improving/replacing @nocachecheck would still be greatly appreciated.


#8

Something that gets invalidated every 5000 cycles doesn’t sound very static, and seems close to needing other forms of probabilistic and unsynchronized caches to be highly performant. Regardless, causing one branch mid-prediction every 5000 cycles would still only be single digit % performance impact.