Question about functors and closures (I think 😅), with an example, but really a general Q

Hi,

I’ve just been fiddling with some bits of code and am still a bit blown away by some julian’ language-constructs. I’ve been doing oo-programming, mainly, for a considerable timespan & my (purely ever academic) exposure to Lisp + Haskell dates a few decades back. When I look at my own code, I’m not entirely sure, whether it is just super-simple or encredibly deep, so I need to stick to these pieces of code, to fix some thoughts in my brain, which are too fluid, atm…

This was my first go, at migrating a rng, from some java-code of mine…

# random-number-generator 1
Base.@kwdef mutable struct RNG
    nextseed::UInt64
end
function (o::RNG)()
    o.nextseed = rng(o.nextseed)
end
function rng(x::UInt64)
    x ⊻= (x << 21)
    x ⊻= (x >>> 35)
    x ⊻= (x << 4)
    return x % UInt64
end

…it allows me, to instantiate more than one rng, not sharing a common seed / state, and works like a charm (for generating UInt-random-numbers. So next thing I did, was porting some more related code, for an extension of the previous concept, reusing the core-rng-function, but extending the RNG to return numbers in a limited domain (not the full UInt64-domain)…

# random-number-generator 2
Base.@kwdef mutable struct RNGMasked
    nextseed::UInt64
    mask::UInt64
end
function (o::RNGMasked)()
    o.nextseed = rng(o.nextseed)
    return o.nextseed & o.mask
end
julia> r = RNGMasked( 0x1234567890abcdef, (1<<8)-1 ) # init with mask=0b11111111 (only produce rn with last 8 bits set)
RNGMasked(0x1234567890abcdef, 0x00000000000000ff)

julia> map( _ -> Int(r()), 1:6 )
6-element Vector{Int64}:
 127
  96
 249
  47
 113
 213

So far so good - this also works great, I’m very pleased, with how the function can have a private state and I can instantiate independent rngs, etc.

But now, I have 2 practical questions, how to go ahead, further:

  1. I’ve defined r (an instance of the RNGMasked) by precomputing the mask, before passing the actual value to be stored in o.mask. Is there a way, that I can pass just the number of bits (nbits=8, in the example) and have the rng do the computation (1<<nbits)-1) for the actual mask-value?
    I mean, without (!) the need, to pass a mask with every call for the next rn?

  2. (Possibly just a generalization of Q #1.: ) Is it somehow possible to have two different functions operate on the same state of a struct, in this context? For example, what if the instance of RNG-Masked should really be a RNGWithOptionMasked, could there be 2 different functions, say next() and next_masked(), accessing the same o.seed (I mean without sacrificing the elegancy of the current constructs - obviously, I could just have an “outside” reference of a state and pass it to any number of functions, to do something with it, but that would break this pattern)?

Apologies, for possibly using too much oo-vocabulary, naming things, that aren’t oo, at all, but as of now, I don’t even know, how exactly to call things, without using oo-words, for them, or becoming super-vague (I tried hard, not using “overloading”, “constructors” and “inheritance” in my questions, but I really can’t help having those words bump my brains, constantly). :sweat_smile:

And finally, just as a very general question: How are the things, I’ve been doing, accurately called? My vague understanding is, that the anonymous functions, mutating instances of the RNG-structs are either closures or functors or both. Is one of those correct? And if yes, why not the other?
(I’ve read numerous threads and guides, by now, but as soon as I think, that it’s the one, I get the idea, it could be the other, when I read the next thread).

PS: I know, there are likely great rngs, in julia, I just need these rngs, to be able to compare the julia-port (of my chess-engine) to the java pendant and quality of rngs really doesn’t matter, here - there is no statistical testing, etc., based on rngs, in this case.

PS #2: The details of my questions might seem overblown, for the simplistic purpose of those 2 rng-implementations, but this, for me, is something like a min.-example, of things I’ll be doing on a much larger scale (with more code, variables and state involved) and I need to get these basics straight, before I can even start thinking about porting more complicated stuff…

1 Like

You can always do rand(1:n) to generate random numbers uniformly in 1:n, and if you want 8 bits then you can alternatively do rand(UInt8) (or rand(UInt16) for 16 bits, etcetera).

(Obviously, you can also write your own function that takes a number of bits as a parameter, but I guess you’re asking for something built-in? Properly written user code can be just as fast as “built-in” standard-library functions in Julia, so you shouldn’t hesitate to write your own functions if you want. Of course, there is always an advantage to calling well-tested, optimized code if it’s available.)

If you want to define your own random number generator, of course you would need to define your own methods like this, and supporting the full API is a bit of work. However, note that there are several packages for random-number generation in Julia if you need an alternative to the built-in RNGs (Xoshiro and Mersenne Twister), e.g. StableRNGs.jl, RandomNumbers.jl, Random123.jl, and LinearShiftRegisters.jl.

Just define two functions f(r) and g(r)? If you want a certain call signature you could always wrap them in an anonymous function as needed, e.g. _ -> f(r).

Callable structs? Or sometimes functors…

https://docs.julialang.org/en/v1/manual/methods/#Function-like-objects

A closure is a function which captures its lexical environment … I believe its actually implemented under the hood as a functor…

dang, I just realized, that after…

r1 = RNG(999)

…I’m not operating on one instance of a RNG with every call of r1(), but rather, I’m instantiating a new one, each time! :scream:

julia> @benchmark for _ in 1:1_000_000 r1() end
...
Memory estimate: 15.26 MiB, allocs estimate: 1000000.

How is that and how can I prevent it? :grimacing:

Yes, you’ll generally want to pass it in as an argument if you want to re-use it.

Don’t benchmark with globals is the first and most important Julia performance tip.

1 Like

yes, thx, I’m generally aware of that.
I was thinking, that it would actually be enough, when the struct was const to not have those side-effects, I’ve already seen (for non-const globals), in other julia-tests.

It’s still a bit of a mystery to me, why it’s allocating memory (regardless of const or not), when, clearly, it is the same variable, that is getting updated (not a new one) and I had it type-annotated to a non-abstract type (UInt64), also, to prevent any compiler-guesswork about types, but well - I’m going to get it, eventually. :sweat_smile:

You declared mutable structs, which require a heap allocation to construct.

I totally get, that heap-memory is allocated, when constructed, however, that should only ever happen once, when you then only ever mutate it, was my previous understanding, at least. :thinking:

It’s definetly fixed, though, by just putting the instantiation in a fct, as you said…

julia> function test_me() r2=RNG(999); for _ in 1:1_000_000 r2() end; end
 Memory estimate: 0 bytes, allocs estimate: 0.

You can always do rand(1:n) to generate random numbers uniformly in 1:n, and if you want 8 bits then you can alternatively do rand(UInt8) (or rand(UInt16) for 16 bits, etcetera).
[…] but I guess you’re asking for something built-in? […]
If you want to define your own random number generator, of course you would need to define your own methods like this, and supporting the full API is a bit of work. However, note that there are several packages for random-number generation in Julia if you need an alternative to the built-in RNGs (Xoshiro and Mersenne Twister), e.g. StableRNGs.jl, RandomNumbers.jl, Random123.jl, and LinearShiftRegisters.jl.

…thx for all the infos & pointers, but this…

PS: I know, there are likely great rngs, in julia, I just need these rngs, to be able to compare the julia-port (of my chess-engine) to the java pendant and quality of rngs really doesn’t matter, here - there is no statistical testing, etc., based on rngs, in this case.

…is really a beats-them-all criterion, rendering all alternatives useless, for my use-case.

I’m using rngs for basically 2 use-cases, only:

  1. generating synthetic data, for testing correctness against a reference-implementation.
  2. performance-testing of options for tablesizes of various hashtables in my chess-engine.

For synthetic tests, I usually need the full domain of UInt64s & for hashtables, I always need some power of 2 (as I need the table-sizes to be powers of 2), but not just those, that are predefined by Int8, Int16 & 32. So no, for these reasons, I’m not even looking for anything built-in. Finding or adopting something suitable likely is more tedious, error-prone and even less performant, than what I have, already (for my use-case!!!).

built-in:

julia> function test_me() for _ in 1:1_000_000 rand(1:0x000000000000ffff) end; end
julia> @benchmark test_me()
BenchmarkTools.Trial: 764 samples with 1 evaluation.
 Range (min … max):  6.296 ms …  10.103 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     6.369 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.540 ms ± 494.287 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█▆▅▁▁                                 ▁
  ███████▇▆▅▅▅▄▅▄▅▄▆▄▁▆▁▄▄▄▆▁▅▁▁▁▅▁▅▅▁▁▅▇█▇▅▅▁▅▁▄▁▁▅▄▅▄▁▄▄▄▁▅ ▇
  6.3 ms       Histogram: log(frequency) by time      8.57 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

own:

julia> function test_me() r = RNGMasked(999,0x0000000000ffff); for _ in 1:1_000_000 res = next_masked(r) end; return next_masked(r); end
test_me (generic function with 1 method)

julia> @benchmark test_me()
BenchmarkTools.Trial: 2261 samples with 1 evaluation.
 Range (min … max):  2.172 ms …   4.659 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.176 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.207 ms ± 155.218 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▅▂
  ████▇▅▅▆▄▁▁▄▄▃▅▁▁▅▁▁▃▃▃▃▁▄▃▅▁▃▁▁▁▄▁▁▁▃▃▃▃▄▃▁▁▅▃▃▃▄▃▄▃▁▁▁▁▁▄ █
  2.17 ms      Histogram: log(frequency) by time      3.22 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

…that’s ~x3, which doesn’t matter for my synthetic correctness-tests, but might already make a difference for finding optimal hashtable-parameters. This will only ever be a tradeoff for my chess-engine, between overhead to store info about certain positions vs. quality of that info (how many nodes have been evaluated below that node in the search-tree) vs. speed gained, if the info is simply discarded, and “only” used for evaluating the current-position. Even the best table-size is not easily determined, as greater is better, in general, but also causes more cache-misses, on average, and “old” entries loose value, as the search advances anyways, etc.

It might be worth it to make your RNG implement the standard RNG interface, though this interface is not entirely solidified I guess… see “Creating New Generators” Random Numbers · The Julia Language

basically what you wind up doing is writing several rand() methods for your new generator to generate say Int8 Int16 Int32 Float32 Float64 and such… and then you can use your generator to power all the other more specialized rand() functions.

You want to be able to specify a bitmask length, you might be able to create something like

rand(rng, Type{Int64}; masklen=4) = rand(rng,Int64) & (1<<masklen - 1)

BTW: It’s fun to watch your enthusiasm for discovering Julia, and I really like the enthusiasm and quality of questions you’re bringing to the forum.

BTW: It’s fun to watch your enthusiasm for discovering Julia, and I really like the enthusiasm and quality of questions you’re bringing to the forum.

Tyvm, I’m glad hearing that - sometimes I think: Who else might even care for the Qs, I’m asking and even if, it likely won’t get me any further, than (I thought) I had gotten on my own, already.
But this premature assessment has turned out wrong in literally every single thread, I’ve started so far, and I’m sometimes rather overwhelmed by all the feedback, I’ve been getting, which (after selecting the useful stuff) is clearly guiding me, in a lot of choices or even “just” general understanding, of how to approach things, in julia. So kudos right back at you (and a lot of others)! :wink:

1 Like

I don’t think it is your RNG struct that is being allocated. You should probably take what I write with a grain of salt, since I’m not an expert and decided to treat this as a learning exercise for myself. I thought it would be fun to try the new allocation profiler. So I did the following

using Profile
using PProf

r1 = RNG(999)

function test_allocs()
  for _ in 1:1_000 r1() end
end

Profile.Allocs.@profile sample_rate=1.0 test_allocs()
PProf.Allocs.pprof(from_c=false)

Profile.Allocs.clear()

Going to the web UI and looking at the flame graph seems to indicate you are allocating a bunch of UInt64s. It helps to run this twice so you don’t get a bunch of allocations from compilation. If you remove the from_c=false, you get a bit more detail showing the boxing of UInt64s.

So then I tried

julia> @code_warntype test_allocs()
MethodInstance for test_allocs()
  from test_allocs() in Main at /home/mas/tmp/test1.jl:21
Arguments
  #self#::Core.Const(test_allocs)
Locals
  @_2::Union{Nothing, Tuple{Int64, Int64}}
Body::Nothing
1 ─ %1  = (1:1000)::Core.Const(1:1000)
│         (@_2 = Base.iterate(%1))
│   %3  = (@_2::Core.Const((1, 1)) === nothing)::Core.Const(false)
│   %4  = Base.not_int(%3)::Core.Const(true)
└──       goto #4 if not %4
2 ┄ %6  = @_2::Tuple{Int64, Int64}
│         Core.getfield(%6, 1)
│   %8  = Core.getfield(%6, 2)::Int64
│         Main.r1()
│         (@_2 = Base.iterate(%1, %8))
│   %11 = (@_2 === nothing)::Bool
│   %12 = Base.not_int(%11)::Bool
└──       goto #4 if not %12
3 ─       goto #2
4 ┄       return nothing

That didn’t show anything that seemed immediately alarming. So I tried Cthulhu (scary name, pleasant software):

 @descend_code_typed test_allocs()
test_allocs() in Main at /home/mas/tmp/test1.jl:21
   ∘ ─ %0 = invoke test_allocs()::Core.Const(nothing)                            
   1 ─       nothing::Nothing                                                 │  
22 2 ┄ %2  = φ (#1 => 1, #6 => %11)::Int64                                    │  
   │   %3  = Main.r1::Any                                                     │  
   │         (%3)()::Any                                                      │  
   │   %5  = (%2 === 1000)::Bool                                              │╻╷ iterate
   └──       goto #4 if not %5                                                ││ 
   3 ─       Base.nothing::Core.Const(nothing)                                ││ 
   └──       goto #5                                                          ││ 
   4 ─ %9  = Base.add_int(%2, 1)::Int64                                       ││╻  +
   └──       goto #5                                                          │╻  iterate
   5 ┄ %11 = φ (#4 => %9)::Int64                                              │  
   │   %12 = φ (#3 => true, #4 => false)::Bool                                │  
   │   %13 = Base.not_int(%12)::Bool                                          │  
   └──       goto #7 if not %13                                               │  
   6 ─       goto #2                                                          │  
   7 ─       return nothing    

That type of Any for Main.r1 doesn’t seem good. So then I made the following change:

function (o::RNG)()
  o.nextseed = rng(o.nextseed)
  nothing
end

which gives

julia> @benchmark for _ in 1:1_000_000 r1() end
BenchmarkTools.Trial: 394 samples with 1 evaluation.
 Range (min … max):  12.597 ms … 13.025 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     12.692 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.702 ms ± 44.060 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

              ▂▁ ▁▅▇█▇▅▃▃▂▃▄▃▂                                 
  ▆▄▄▁▄▁▄▇▆▆███████████████████▇▄▇▆▇▆▁▇▄▄▄▇▄▇▆▁▄▄▁▄▁▁▁▁▁▁▁▁▁▄ ▇
  12.6 ms      Histogram: log(frequency) by time      12.9 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

So what I think is going on is that your r1 is global so Julia can’t infer a type other than Any for r1(). It returns a UInt64, but since that is not known during compilation it is boxed, resulting in an allocation. So the root problem is the use of a global variable, as it usually is, but it did manifest in a way that was interesting to track down.

3 Likes

I thought it would be fun to try the new allocation profiler.

Hehe, thank you, for the effort!
Though I think, you lost me, here… :sweat_smile:

Today, I have to say, I don’t know anymore, what was so unclear, to me, yesterday - the global instance certainly was one thing, that I overlooked, when testing, but I think I was also too confused by reading too many topics about functors & closures, that weren’t really all that relevant for what I was actually trying to do.

Today, I just thought: What the heck - I’ll just hack together what comes to my mind, much in the way, I have done things in other languages - and voila, it is not all that different. Dunno, why I was even concerned, that I would have to do things all that differently…

# ==========================================================================
mutable struct RNG
    state::UInt64
    mask::UInt64
end
# --------------------------------------------------------------------------
RNG()                                   = RNG( time_ns() , typemax(UInt64) )
RNG(seed::UInt64)                       = RNG( seed      , typemax(UInt64) )
RNGMaskNBits(seed::UInt64,nbits::UInt8) = RNG( seed      , (1 << nbits) -1 )
# --------------------------------------------------------------------------
function next(r::RNG)                    return r.state = rng(r.state)  end
function nextMasked(r::RNG)              return next(r) & r.mask        end
function nextWithMask(r::RNG,m::UInt64)  return next(r) &   m           end
# --------------------------------------------------------------------------
function rng(x::UInt64) # (2003) Marsaglia 64-bit-xorshift, period = 2^64-1
    x ⊻= (x <<  21)
    x ⊻= (x >>> 35)
    x ⊻= (x <<   4)
    return x % UInt64
end
# ==========================================================================

…it’s been a good excercise, and apart from having my rng ported, it looks like I can pretty much do everything, I’m used to, from an oo-paradigm, including inheritance, overloading constructors + functions and encapsulating complex details in a bunch of structures + functions, without having to bother with them, once they’re working. Also, it’s great, that the low-level-stuff is still accessible enough (both in coding and in profiling / benchmarking), as that is often where the high-level-languages loose track of performance-issues, from my exp.

testing for 1 iterations...
#1: 1011011100010101100011000001010110000111111110110000110010001101
#2: 0000000000000000000000000000000000000000000000000000000010001101
#3: 1011000000000000000011000000000000000000111100000000000000000000  
  41.171 ns (0 allocations: 0 bytes)
  41.188 ns (0 allocations: 0 bytes)
  42.095 ns (0 allocations: 0 bytes)
testing for 100 iterations...
  212.778 ns (0 allocations: 0 bytes)
  217.808 ns (0 allocations: 0 bytes)
  220.736 ns (0 allocations: 0 bytes)
testing for 10,000 iterations...
  17.000 μs (0 allocations: 0 bytes)
  17.500 μs (0 allocations: 0 bytes)
  17.400 μs (0 allocations: 0 bytes)
testing for 1,000,000 iterations...
  1.697 ms (0 allocations: 0 bytes)
  1.746 ms (0 allocations: 0 bytes)
  1.746 ms (0 allocations: 0 bytes)

…also can’t complain about the performance, at all (each of the 3 lines in all the blocks tests one of the 3 methods, like defined above), 1x without mask & 2 more times with different masks, each. Apart from “don’t worry too much”, one of the main-lessons I’ve learned is: Better write a proper test-function, always, instead of doing quick-and-sloppy ad-hoc tests in the REPL. :wink:

1 Like

Ad-hoc testing from the repl can work well, but especially for sub microsecond timings, you have to be really careful since things like calling conventions and inlining can have major effects.

If you won’t change the mask, then you can do as the following

mutable struct RNG
    state::UInt64
    const mask::UInt64 # mask _cannot_ change
end

RNG()                                    = RNG( time_ns() , typemax(UInt64) )
RNG(seed::UInt64)                        = RNG( seed      , typemax(UInt64) )
RNGMaskNBits(seed::UInt64, nbits::UInt8) = RNG( seed      , (1 << nbits) -1 )

next(r::RNG)  = (r.state = _rng(r.state))
nextMasked(r::RNG) = next(r) & r.mask
next(r::RNG, m::UInt64) = next(r) & m # gets dispatched with the mask

function _rng(x::UInt64) # changed to _rng to mark it as "internal", this is not enforced by the compiler, but it is a convention
    x ⊻= (x <<  21)
    x ⊻= (x >>> 35)
    x ⊻= (x <<   4)
    return x % UInt64
end

Now, I would design it more like the following (keeping most of your design)

mutable struct RNG <: AbstractRNG
    state::UInt64
end

RNG() = RNG(time_ns())
# RNG(seed::UInt64) = RNG(seed) # this isn't needed, as is the same as the default

struct MaskedRNG <: AbstractRNG
    rng::RNG
    mask::UInt64
end

MaskedRNG(nbits::UInt8)               = MaskedRNG(RNG(), (1 << nbits) - 1)
MaskedRNG(seed::UInt64, nbits::UInt8) = MaskedRNG(RNG(seed), (1 << nbits) - 1)

next(r::RNG)                               = (r.state = _rng(r.state))
next(r::MaskedRNG)                         = next(r) & r.mask
next(r::Union{RNG, MaskedRNG}, m::Integer) = next(r) & m

function _rng(x::UInt64)
    x ⊻= (x <<  21)
    x ⊻= (x >>> 35)
    x ⊻= (x <<   4)
    return x % UInt64
end
1 Like

Thx for the input! :blush:

Now, I would design it more like the following (keeping most of your design)

I changed these…

next(r::MaskedRNG)                         = next(r) & r.mask
next(r::Union{RNG, MaskedRNG}, m::Integer) = next(r) & m

…to those, below…

next(r::MaskedRNG)                         = next(r.rng) & r.mask # r.rng!
next(r::Union{RNG, MaskedRNG}, m::Integer) = next(r.rng) & m      # r.rng!

Otherwise it’s an infinte recursive call to itself (causing a StackOverflowError) for the first one and a potentially wrong mask, for the second function (in case the passed mask m is not the same, as the const mask in the MaskedRNG

…otherwise the timings of the next(r::Union{RNG, MaskedRNG}, m::Integer) - function would just have been a bit too nice (last one, of the three)… :rofl:

testing for 1,000,000 iterations...
#1: 0110101100100001011111111100000011111000110101011011101010101111
#2: 0000000000000000000000000000000000000000000000000000001010101111
#3: 0000000000000000000000000000000000000000000000000000000000000000
  1.655 ms (0 allocations: 0 bytes)
  1.746 ms (0 allocations: 0 bytes)
  41.129 ns (0 allocations: 0 bytes)

But nice, to know, that the compiler is able, to deduce, that 2 non-overlapping masks (which I am using in my tests), result in dead code. I was using these 2 masks:

    # mask = lowest 10 bits
    random = MaskedRNG(seed%UInt64, UInt8(10))

    # mask = 3 nibbles set to ...1111..., each, rest=0
    next(random, 0xf0000f0000f00000)  # <-- non-overlapping with 1st mask

Thanks for this demonstration of how type-subtype - relationships are modelled, in julia, how dispatch achieves basically the same thing (and beyond this case much more!) as inheritance and overloading does in oop, and also how one has to be more cautious, than in plain-oo! :wink:

yeah, one thing you often need to be careful about is that for sufficiently simple functions, the compiler is sometimes annoyingly good at realizing that your benchmark doesn’t actually do anything.

yeah, one thing you often need to be careful about is that for sufficiently simple functions, the compiler is sometimes annoyingly good at realizing that your benchmark doesn’t actually do anything.

…hehe, yeah - I’m luckily used to those effects from java & the JMH (“java-microbenchmark-harness”). They have a way, to deal with it, canonically, by offering to call the method(s) to be benchmarked with an instance of class Blackhole and at the end of your test-method, you can just call blackhole.consume(...) and dump anything in there, that you have computed, in the method. It’s quite an iconic pattern and once adopted, rarely forgotten (preventing the jit to remove any code, needed for the computation of whatever you dump in the black hole).

I think is enough with the next

next(r::MaskedRNG)                         = next(r.rng) & r.mask
next(r::Union{RNG, MaskedRNG}, m::Integer) = next(r) & m

Because the second next(r) will be dispatched correctly (without the m argument) to the first two.

Actually, I would go event further and write everything as follows, to avoid needing to write overly specific types.

mutable struct RNG <: AbstractRNG
    state::UInt64
end
# the default constructor promote to Integer by default

RNG() = RNG(time_ns())

struct MaskedRNG <: AbstractRNG
    rng::RNG
    mask::UInt64
end

MaskedRNG(nbits)       = MaskedRNG(RNG(), (1 << UInt8(nbits)) - 1)
# MaskedRNG(seed::Integer, nbits) = MaskedRNG(RNG(seed), (1 << UInt8(nbits)) - 1) # error when nbits = 64
# this next version allows setting high nbits using negatives, like -8 to mask to 0xff00000000000000
# in benchmarking it takes the same time as the above
MaskedRNG(seed::Integer, nbits) = MaskedRNG(RNG(seed), ~UInt64(0) >> ifelse(0 < abs(nbits) <= 64, sign(nbits)*64 - nbits, 64))

next(r::RNG)                               = (r.state = _rng(r.state))
next(r::MaskedRNG)                         = next(r.rng) & r.mask
next(r::Union{RNG, MaskedRNG}, m::Integer) = next(r) & m # this will dispatch correctly

function _rng(x::UInt64)
    x ⊻= (x <<  21)
    x ⊻= (x >>> 35)
    x ⊻= (x <<   4)
    return x % UInt64
end