Why is [x]or (a function call) so long/slow (not inlined)?

My between functions, could be a PR for Base (but shouldn’t be needed… maybe non-exported building blocks for codegen?):

julia> between_arbitrary(from, op, to)=op(from, to) #do they violate coding standard with op (or in_val) in the middle?

julia> between_inclusive(from, in_val, to)=between_arbitrary(from, <=, in_val) & between_arbitrary(in_val, <=, to)

They simplify:

new_uppercase(c::Char) = begin test1='a' <= c; test2=c <= 'z'; return test1&test2 ? Char(xor(UInt32(c), 0x20)) : Char(ccall(:utf8proc_toupper, UInt32, (UInt32,), c)) end

to:

new_uppercase(c::Char) = !between_inclusive('a', c, 'z') ? Char(ccall(:utf8proc_toupper, UInt32, (UInt32,), c)) : Char(xor(UInt32(c), 0x20))

that assembly code, has one jump (one cmpl and ja each), not two jumps and compares as with straightforward (‘a’ <= c <= ‘z’).

[I don’t like that c ? “then” : “else”, considers the else, more likely; I fixed with !c and reversing. Both have as many jumps.]

Same applies to:

[Still frustrated to not see the 5x speedup that I timed.]

julia> new_uppercase_for_now_only_for_ascii(c::Char) = Char(xor(UInt32(c), between_inclusive('a', c, 'z') << 5))

julia> new_uppercase_for_now_only_for_ascii(c::Char) = Char(xor(UInt32(c), ('a' <= c <= 'z') << 5))
c = map(Char, rand(32:127, 5000000)); # note, not, that would work (getting to also run fast): c = rand(Char, 5000000);

julia> @time map(new_uppercase_for_now_only_for_ascii, c);
First run:  0.045256 seconds (8 allocations: 19.074 MB)
..
julia> @time map(new_uppercase_for_now_only_for_ascii, c);
  0.017182 seconds (8 allocations: 19.074 MB)

vs.

julia> @time map(new_uppercase, c);
  0.079274 seconds (8 allocations: 19.074 MB)

“Off-topic”:

Absurd is a nice word. I might disagree about it’s use here. I was going to let this drop; as I said, not important, at the bottom of the list of my concerns with the code here.

yes, for x86, no for “LLVM assembly language”

I should have remembered, or just done that.

I’m confused why map allocates (Char is immutable, is it also a bitstype 32, wasn’t sure could be both, seems so at https://github.com/JuliaLang/julia/blob/d3fb0f225c1abb077ab59fc110b648a8400639c6/base/docs/basedocs.jl ):

julia> @time map(new_uppercase, c);
  0.096174 seconds (8 allocations: 19.074 MB)

I thought would make better..:

julia> test() = for i in eachindex(c) dummy = new_uppercase(c[i]) end
julia> @time test()
  1.593146 seconds (15.00 M allocations: 305.201 MB, 6.01% gc time)  #Why slower that:


julia> test() = for i in 1:5000000 dummy = new_uppercase(c[i]) end
julia> @time test()
  0.566907 seconds (5.00 M allocations: 76.286 MB, 1.26% gc time)

julia> test() = for i in 1:5000000 dummy = new_uppercase_for_now_only_for_ascii(c[i]) end
julia> @time test()
  0.533786 seconds (5.00 M allocations: 76.286 MB, 1.28% gc time)

Yes, last two are faster, is that strange with lots more MB?

Maybe the reason:

julia> @time dummy = new_uppercase_for_now_only_for_ascii(c[1])
  0.000012 seconds (4 allocations: 160 bytes)

julia> typeof(dummy)
Char

help?> Char
  immutable Char <: Any  #is this the problem, should it be immutable Char <: Unsigned (or UInt32)? As with:

help?> UInt32
  immutable UInt32 <: Unsigned

Didn't help:
julia> @time dummy = UInt32(new_uppercase_for_now_only_for_ascii(c[1]))
  0.000016 seconds (4 allocations: 160 bytes)
0x00000047

map is defined to produce a new array, consisting of the function applied to each element of the input array, so it has to allocate space for the result array.

This is quite different from the other code you posted, which assigns the results to a dummy variable, i.e. discards the results. The built-in function that does that is foreach.

2 Likes

Ok, yes, I meant, I’m confused why allocating less (e.g. compared to eachindex). [In theory, the array that map allocates, that is thrown out, could also be proven not need - code eliminated. Kind of that I expected since only “8 allocations”. I guess array doubling is happening repeatedly.]

Isn’t eachindex, a way not only to get at them all, but also fast? It may only be an exception here because of Char.

map and for are both sub-optimal in different ways, is there a good way to get no allocations?

I thought this might help (dummy not thrown away and then not GCed), but this only double allocations:

julia> test() = for i in 1:5000000 c[i] = new_uppercase(c[i]) end
julia> @time test()
  0.331344 seconds (10.00 M allocations: 152.636 MB, 6.12% gc time)

I guess I just confimed, dummy (Char) is allocated in the heap (unlike UInt32). There’s no need to…