Help to get my slow Julia code to run as fast as Rust/Java/Lisp

Are you sure? If I do

n = Base.GMP.MPZ.set_ui!(big"1", 1)

then the ns should be independent for each call. Even though it may reuse some global, that’s only a problem if calls execute in parallel.

Feels a bit hacky though. I wonder if the need for set_ui! is a bug.

Why are we using BigInt in the first place? Neither Rust nor Java seem to use their BigInt versions and instead just use UInt8[] and at least Rust passes around views of them since they’re not written to anyway. We can in theory do the exact same thing (well, minus the not-writing guarantee):

julia> a = Dict([0x01] => "a", [0x2] => "b")
Dict{Vector{UInt8}, String} with 2 entries: 
  [0x01] => "a"                             
  [0x02] => "b"                             
                                            
julia> b = [0x1, 0x2]                       
2-element Vector{UInt8}:                    
 0x01                                       
 0x02                                       
                                            
julia> a[view(b, 1:1)]                      
"a"                                         

Seems like the ground is much less level than we initially thought :smiley:

2 Likes

Thanks for your comment. But the behaviour I demonstrated seems to indicate that there’s a bug, doesn’t it? big"1" should not just reuse, but also make sure to set the correct value?

See above:

Also on the master branch they do use BigInt:

I guess we could spin up Int256 via BitIntegers.jl though?

BTW, mul_ui! is faster than mul_si!, and is ok because you multiply with a positive number, 10.

I think creating a muladd! method would make the code cleaner. There is an addmul in the GMP library, but curiously no muladd. Maybe there’s no performance advantage.

I see :man_facepalming: Missed that since I’ve only worked on julia-optimisations. Which branch are the most recent benchmarks from then? master or some older version where Vec[u8] was a thing?

In any case, the benchmark runner from master also doesn’t work for me, so I can’t even reproduce any timings :man_shrugging: @renatoathaydes do you maybe know what’s going on here? I’m just running ./benchmark.sh from the master branch of your repo. I still get the same error I posted above, about not being able to find pid_rusage in libproc.

Couldn’t help but try to optimize this further :sweat_smile: . Here is my code: https://github.com/jakobnissen/prechelt_benchmark

Note that it is optimized for runtime only, with compiletime subtracted. Here are a few of the optimizations:

  • Use UInt256 from BitIntegers.jl instead of BigInt.
  • Use a lookup-table to translate words to numbers, and do this by operating on the bytes of the input string instead of the chars
  • Sizehint the dictionary
  • Do not allocate a new string when printing, instead, do multiple calls to print
  • Do not allocate new arrays of strings when recursing in printTranslations where possible, instead push and pop from the existing array (perhaps even more allocations can be removed)
  • The dict’s valtype is now Union{Tuple{String}, Vector{String}}

The result is this, median runtimes, when printing to an IOBuffer:
Before (v1): 856 ms
After (v2): 28.8 ms
Speedup: 30x

The majority of time in the optimized version is spent building the dictionary, on stuff like rehasing it, allocating the individual arrays

4 Likes

@DNF Nope - it parses the macro at parsetime, and allocates the number once. When re-running the code, it uses the same object, and does not re-initialize it. It is, in fact, correct behaviour

I understand that it reuses the object without re-initializing, but it seems very odd that it doesn’t reset the value.

I see no hint of this behaviour in the docstring:

help?> @big_str
  @big_str str
  @big_str(str)


  Parse a string into a BigInt or BigFloat, and throw an ArgumentError if the string is not a valid number. For
  integers _ is allowed in the string as a separator.

  Examples
  ≡≡≡≡≡≡≡≡≡≡

  julia> big"123_456"
  123456

  julia> big"7891.5"
  7891.5

Does this mean that anytime one writes x = big"143" in a function one might get a something that is dramatically different from 143?

It seems really dangerous and surprising, do you disagree with that?

Btw, this is what I get for my code when benchmarking against the other implementations:

Benchmarking...
===> java -cp build/java Main
Proc,Memory(bytes),Time(ms)
java,0,290
Proc,Memory(bytes),Time(ms)
java,0,354
Proc,Memory(bytes),Time(ms)
java,0,702
Proc,Memory(bytes),Time(ms)
java,0,766
===> java -cp build/java Main2
Proc,Memory(bytes),Time(ms)
java,0,375
Proc,Memory(bytes),Time(ms)
java,0,590
Proc,Memory(bytes),Time(ms)
java,0,1785
Proc,Memory(bytes),Time(ms)
java,0,2471
===> ./phone_encoder
Proc,Memory(bytes),Time(ms)
./phone_encoder,0,142
Proc,Memory(bytes),Time(ms)
./phone_encoder,0,274
Proc,Memory(bytes),Time(ms)
./phone_encoder,0,804
Proc,Memory(bytes),Time(ms)
./phone_encoder,0,1683
===> j1 src/julia/phone_encoder.jl
Proc,Memory(bytes),Time(ms)
j1,0,802
Proc,Memory(bytes),Time(ms)
j1,0,825
Proc,Memory(bytes),Time(ms)
j1,0,1243
Proc,Memory(bytes),Time(ms)
j1,0,1618

@jakobnissen’s version fails validation for me.

2 Likes

It seems really dangerous and surprising, do you disagree with that?

I sort of disagree. It’s not documented in @big_str, since it’s part of how macros in general work. It is documented under Julia’s section on macros: Metaprogramming · The Julia Language

It actually is a feature, not a bug (although it can cause bugs, if people are unaware of it). The advantage is that all the work of creating the object is done at compile time, so it can be much more efficient. You can think of it as being a constant that is associated with the function. If you mutate the constant, then yes, that can lead to problems.

1 Like

Yeah, I have at least a rudimentary understanding of macros, but this just seems a bit crazy. The docs don’t say anything about reusing memory, so I don’t think this is obvious, just because it’s a macro.

For small numbers, setting the correct value after creating it is basically free, but I guess that it will be expensive and complicated for really big numbers.

I always understood the big"" macro to be there only because it was necessary for creating some numbers that would otherwise be truncated. I had no idea there was a performance optimization in there, let alone one this dangerous.

@jakobnissen’s version fails validation for me.

Whooops, that’s because I accidentally printed an extra space for each line. Just pushed a fix to my repo.

@DNF After thinking about it, you’re probably right. It would be better if the default behaviour was that it allocated a new integer every time the function with the macro was run. It could work like BioSequence’s @dna_str, except it could default to dynamic allocation. People who want the performance can then opt-in to the macro optimization. Should be easy to implement.

Will you make an issue on the Julia repo, or should I?

2 Likes

It would be great if you could (I should really do something else now), but if not, I’ll do it tonight.

1 Like

What am I missing here:

words2 = push!(copy(words), string(nthDigit(digits, start)))
printTranslations(io, num, digits, dict, start + 1, words2)

Won’t push!/pop! work here as well?

Sure, there is no difference there. Also, I made the issue on Julia about @big_str

2 Likes

I don’t think you pushed anything, but if I fix the printing I get

Benchmarking...
===> java -cp build/java Main
Proc,Memory(bytes),Time(ms)
java,0,308
Proc,Memory(bytes),Time(ms)
java,0,390
Proc,Memory(bytes),Time(ms)
java,0,591
Proc,Memory(bytes),Time(ms)
java,0,828
===> java -cp build/java Main2
Proc,Memory(bytes),Time(ms)
java,0,336
Proc,Memory(bytes),Time(ms)
java,0,545
Proc,Memory(bytes),Time(ms)
java,0,1798
Proc,Memory(bytes),Time(ms)
java,0,1907
===> ./phone_encoder
Proc,Memory(bytes),Time(ms)
./phone_encoder,0,127
Proc,Memory(bytes),Time(ms)
./phone_encoder,0,263
Proc,Memory(bytes),Time(ms)
./phone_encoder,0,802
Proc,Memory(bytes),Time(ms)
./phone_encoder,0,1711
===> j1 src/julia/phone_encoder.jl
Proc,Memory(bytes),Time(ms)
j1,0,2043
Proc,Memory(bytes),Time(ms)
j1,0,2831
Proc,Memory(bytes),Time(ms)
j1,0,6398
Proc,Memory(bytes),Time(ms)
j1,0,8082
===> j1 src/julia/phone_encoder_pfitzseb.jl
Proc,Memory(bytes),Time(ms)
j1,0,703
Proc,Memory(bytes),Time(ms)
j1,0,773
Proc,Memory(bytes),Time(ms)
j1,0,1155
Proc,Memory(bytes),Time(ms)
j1,0,1664
===> j1 src/julia/phone_encoder_jakni.jl
Proc,Memory(bytes),Time(ms)
j1,0,1074
Proc,Memory(bytes),Time(ms)
j1,0,1153
Proc,Memory(bytes),Time(ms)
j1,0,1496
Proc,Memory(bytes),Time(ms)
j1,0,1896

Oh, and here is my version (just realized I hadn’t posted that anywhere):

Replaced BigInt usage with two Int128s, optimized printing:
#=
# Port of Peter Norvig's Common Lisp program from http://norvig.com/java-lisp.html.
#
# - Julia version: 1.6.2
# - Author: Renato Athaydes
# - Date: 2021-07-24
=#
const emptyStrings = String[]

function printTranslations(dict, num, digits, start=1, words=String[])
    if start > length(digits)
        print(num, ": ")
        for (i, word) in enumerate(words)
            print(word)
            i == length(words) ? println() : print(' ')
        end
        return
    end
    foundWord = false
    n1, n2 = Int128(0), Int128(1)
    for i in start:length(digits)
        if n2 >= 1e25
            n1 = n1 * 10 + nthDigit(digits, i)
        else
            n2 = n2 * 10 + nthDigit(digits, i)
        end
        for word in get(dict, (n1, n2), emptyStrings)
            foundWord = true
            push!(words, word)
            printTranslations(dict, num, digits, i + 1, words)
            pop!(words)
        end
    end
    if !foundWord &&
        !(!isempty(words) && length(words[end]) == 1 && isdigit(words[end][begin]))
        push!(words, string(nthDigit(digits, start)))
        printTranslations(dict, num, digits, start + 1, words)
        pop!(words)
    end
end

function loadDictionary(file)::Dict{Tuple{Int128, Int128}, Vector{String}}
    local dict = Dict{Tuple{Int128, Int128}, Vector{String}}()
    for word in eachline(file)
        push!(get!(() -> String[], dict, wordToNumber(word)), word)
    end
    dict
end

function nthDigit(digits::String, i::Int64)::UInt
    UInt(digits[i]) - UInt('0')
end

function charToDigit(ch::Char)::UInt
    ch = lowercase(ch)
    ch == 'e' && return 0
    ch in ('j', 'n', 'q') && return 1
    ch in ('r', 'w', 'x') && return 2
    ch in ('d', 's', 'y') && return 3
    ch in ('f', 't') && return 4
    ch in ('a', 'm') && return 5
    ch in ('c', 'i', 'v') && return 6
    ch in ('b', 'k', 'u') && return 7
    ch in ('l', 'o', 'p') && return 8
    ch in ('g', 'h', 'z') && return 9
    throw(DomainError(ch, "Not a letter"))
end

function wordToNumber(word::String)::Tuple{Int128, Int128}
    n1, n2 = Int128(0), Int128(1)
    for ch in word
        if isletter(ch) && isascii(ch)
            if n2 >= 1e25
                n1 = n1 * 10 + charToDigit(ch)
            else
                n2 = n2 * 10 + charToDigit(ch)
            end
        end
    end
    n1, n2
end


function main(dictionary, input)
    dict = open(dictionary) do file
        loadDictionary(file)
    end

    open(input) do file
        for num in eachline(file)
            printTranslations(dict, num, filter(isdigit, num))
        end
    end
end

if length(ARGS) == 2
    main(ARGS...)
end
3 Likes

That’s the one with two Int128s? Seems like our bottleneck truly is GMP then, since that’s what we’re binding against :confused: It’s also hard to optimize for julia, since it’s a ccall that does all the work here.

Yes. Not really surprising. Imho BigInt should only be used if you don’t actually have an idea on how big your numbers can get.

7 Likes