Pre-allocate for findall

I am absolutely sure something like this has been asked before but I cannot for the life of me find it.

I’m currently working on a memory- and speed-efficient way to access the first 1 billion primes as part of a different project. The way I’m using is a 1.4GB Bitarray, which I then use findall to convert back to numbers.

This is acceptably fast (~20 seconds) for my purposes, EXCEPT I need to manually add the number 2 back to the front of the array, which more than quadruples the run time.

Since I know exactly how big the array I need is, is there any way to do something where findall stores its answer directly into a pre-allocated array, so that I don’t need to do 1 billion copy operations?

My current code:

const last_prime = 22801763489

function get_primes()
    # Acquire primes as Bitarray
    io = open("primes_as_bits.bitarray")
    list = BitArray(undef, (Int((last_prime - 1) / 2)))
    read!(io, list)
    close(io)

    # Convert Bitarray into Int[]
    final = [2]
    push!(final, (findall(x -> x == 1, list) .* 2 .+ 1)...)
    return final
end

P.S. I’ve got a threaded version as well, which is a little faster, but I feel like it would still be much more efficient if I could just solve this problem.

First of all, you should use pushfirst! instead of push!. Most of the extra time comes from splatting. Secondly, you can just use a loop to avoid the extra allocations.

Also, the bitarray would be 1/3rd smaller if you masked out multiples of 3 as well as 2.

Do you mean loop through findnext? I think I tried that and it was still slower, but I’ll get an actual benchmark here in a second. I assumed findall had some hidden optimizations under the hood

Also, thank you. I have like 5 versions I’ve tried and pushfirst was among them. That was the one I was actually citing when I said “4x slower”

What I mean is just

function get_primes()
    # Acquire primes as Bitarray
    io = open("primes_as_bits.bitarray")
    list = BitArray(undef, (Int((last_prime - 1) / 2)))
    read!(io, list)
    close(io)

    # Convert Bitarray into Int[]
    final = Vector{Int}(undef, n)
    final[1] = 2
    i = 1;
    for j in 2:length(list)
         if list[j]
            final[i] = list[j]*2 + 1
            i += 1
        end
    end
    return final
end

This is also fairly easy to multithread if you also store a couple offsets for the nth prime for various n.

Well I’ll be danged. I swear I tried that already.

Fwiw, findnext seems to be just a bit faster, but they’re both well under 30 seconds

Thank you very much!

P.S. Yeah, multithreading is the next step, I just wanted to have a principled way of getting the solution first

If you want to squeeze out some extra performance, you could throw @inbounds at it, and @celrod might know of an easy way to get this to vectorize.

Any reason we shouldn’t have a findall! that takes an out array?

I appreciate it, but I only need this to be less than 30 seconds. This isn’t the project. The next set of calculations is going to take hours. I just needed this to be fast enough to be able to monitor the first step of the next thing without losing my sanity

What’s the main calculation?

I’m making heat maps of the final digit of prime p mapped against prime p-1 in different base systems n. It makes pretty pictures.

If I can find a way to sensibly do so, I’m going to try to find a continuous analogue to the idea of

array[p[i] % n + 1, p[i - 1] % n + 1] += 1

That way, I can make a full-blown animation between the different integer bases, as well as have an HD version of each frame.

Like so:




Some cool stuff happens particularly at prime bases and extremely non-prime bases

Interesting! I think you might be able to compute this efficiently from the bitmap directly. Specifically, I believe that this can be viewed as a convolution-like operation over the reshaped bitmap.

How do you mean? I had the word “convolution” flash across my mind while thinking about this, but I couldn’t come up with any sensible way to use it.

Also, if you’re interested in playing around with this, I will happily share my code.

Fwiw, the loop using findnext even outperformed findall without adding the 2 back in by about 50%.

The idea is that if you define mask to be an n by k bitarray, then array[i, j] = dot(mask[i, :], mask[j, :]). I definitely don’t have time to work on this, but it is really cool.

I’ll think on that and report back if I the continuous version I’m looking for.

Thanks again for your help!