Pre-allocate for findall

MacKenzieHnC · September 18, 2022, 11:53pm

I am absolutely sure something like this has been asked before but I cannot for the life of me find it.

I’m currently working on a memory- and speed-efficient way to access the first 1 billion primes as part of a different project. The way I’m using is a 1.4GB Bitarray, which I then use findall to convert back to numbers.

This is acceptably fast (~20 seconds) for my purposes, EXCEPT I need to manually add the number 2 back to the front of the array, which more than quadruples the run time.

Since I know exactly how big the array I need is, is there any way to do something where findall stores its answer directly into a pre-allocated array, so that I don’t need to do 1 billion copy operations?

My current code:

const last_prime = 22801763489

function get_primes()
    # Acquire primes as Bitarray
    io = open("primes_as_bits.bitarray")
    list = BitArray(undef, (Int((last_prime - 1) / 2)))
    read!(io, list)
    close(io)

    # Convert Bitarray into Int[]
    final = [2]
    push!(final, (findall(x -> x == 1, list) .* 2 .+ 1)...)
    return final
end

P.S. I’ve got a threaded version as well, which is a little faster, but I feel like it would still be much more efficient if I could just solve this problem.

Oscar_Smith · September 19, 2022, 12:23am

First of all, you should use pushfirst! instead of push!. Most of the extra time comes from splatting. Secondly, you can just use a loop to avoid the extra allocations.

Oscar_Smith · September 19, 2022, 12:24am

Also, the bitarray would be 1/3rd smaller if you masked out multiples of 3 as well as 2.

MacKenzieHnC · September 19, 2022, 12:29am

Do you mean loop through findnext? I think I tried that and it was still slower, but I’ll get an actual benchmark here in a second. I assumed findall had some hidden optimizations under the hood

Also, thank you. I have like 5 versions I’ve tried and pushfirst was among them. That was the one I was actually citing when I said “4x slower”

Oscar_Smith · September 19, 2022, 12:40am

What I mean is just

function get_primes()
    # Acquire primes as Bitarray
    io = open("primes_as_bits.bitarray")
    list = BitArray(undef, (Int((last_prime - 1) / 2)))
    read!(io, list)
    close(io)

    # Convert Bitarray into Int[]
    final = Vector{Int}(undef, n)
    final[1] = 2
    i = 1;
    for j in 2:length(list)
         if list[j]
            final[i] = list[j]*2 + 1
            i += 1
        end
    end
    return final
end

Oscar_Smith · September 19, 2022, 12:48am

This is also fairly easy to multithread if you also store a couple offsets for the nth prime for various n.

MacKenzieHnC · September 19, 2022, 12:50am

Well I’ll be danged. I swear I tried that already.

Fwiw, findnext seems to be just a bit faster, but they’re both well under 30 seconds

Thank you very much!

P.S. Yeah, multithreading is the next step, I just wanted to have a principled way of getting the solution first

Oscar_Smith · September 19, 2022, 12:52am

If you want to squeeze out some extra performance, you could throw @inbounds at it, and @celrod might know of an easy way to get this to vectorize.

jar1 · September 19, 2022, 12:53am

Any reason we shouldn’t have a findall! that takes an out array?

MacKenzieHnC · September 19, 2022, 12:54am

I appreciate it, but I only need this to be less than 30 seconds. This isn’t the project. The next set of calculations is going to take hours. I just needed this to be fast enough to be able to monitor the first step of the next thing without losing my sanity

Oscar_Smith · September 19, 2022, 12:55am

What’s the main calculation?

MacKenzieHnC · September 19, 2022, 1:00am

I’m making heat maps of the final digit of prime p mapped against prime p-1 in different base systems n. It makes pretty pictures.

If I can find a way to sensibly do so, I’m going to try to find a continuous analogue to the idea of

array[p[i] % n + 1, p[i - 1] % n + 1] += 1

That way, I can make a full-blown animation between the different integer bases, as well as have an HD version of each frame.

Like so:

Some cool stuff happens particularly at prime bases and extremely non-prime bases

Oscar_Smith · September 19, 2022, 1:09am

Interesting! I think you might be able to compute this efficiently from the bitmap directly. Specifically, I believe that this can be viewed as a convolution-like operation over the reshaped bitmap.

MacKenzieHnC · September 19, 2022, 1:14am

How do you mean? I had the word “convolution” flash across my mind while thinking about this, but I couldn’t come up with any sensible way to use it.

Also, if you’re interested in playing around with this, I will happily share my code.

MacKenzieHnC · September 19, 2022, 1:17am

Fwiw, the loop using findnext even outperformed findall without adding the 2 back in by about 50%.

Oscar_Smith · September 19, 2022, 1:19am

The idea is that if you define mask to be an n by k bitarray, then array[i, j] = dot(mask[i, :], mask[j, :]). I definitely don’t have time to work on this, but it is really cool.

MacKenzieHnC · September 19, 2022, 1:24am

I’ll think on that and report back if I the continuous version I’m looking for.

Thanks again for your help!

Topic		Replies	Views
Help speedup a brute-force solution Performance combinatorics	13	1055	June 18, 2021
Efficient way to get indices of true elements from BitArray New to Julia indexing	5	1713	February 3, 2022
Performance hacking `(&)(a::BitArray, b::BitArray)` Performance	14	209	October 22, 2024
Findall slow General Usage	8	1889	October 24, 2019
Efficient (non-allocating, in-place) getindex! for BitArray? Performance question , indexing	19	1621	July 2, 2020

Pre-allocate for findall

Related topics