Solution for issue #25216, larger octal literals produce smaller types, sometimes

ScottPJones · December 22, 2017, 1:12pm

In reference to this issue raised by @iamed2 (https://github.com/JuliaLang/julia/issues/25216),
regarding the comment by @jeff.bezanson (https://github.com/JuliaLang/julia/issues/25216#issuecomment-353356186), I believe there is a simple solution.

function os(digits)
          len = length(digits)*3
          r = len & 7
          len>>>3 + (r == 1 ? (digits[1] > '3') : r == 2 ? (digits[1] > '1') : 1)
       end

This will correctly calculate the number of bytes required, while still allowing extra zeros, if not stradling a byte boundary, to make the returned type larger, while avoiding strange stuff like 0o000 returning 0x0000, but 0o377 returning 0xff

julia> for str in ("0", "00", "40", "000", "040", "377", "400", "0000", "0400", "00000", "000000") ; println(str, repeat(" ", 8-length(str)), os(str), "  ", typeof(Meta.parse("0o" * str))) ; end
0       1  UInt8
00      1  UInt8
40      1  UInt8
000     1  UInt16
040     1  UInt16
377     1  UInt8
400     2  UInt16
0000    2  UInt16
0400    2  UInt16
00000   2  UInt16
000000  2  UInt32

ScottPJones · December 22, 2017, 2:04pm

In reference to the comment: https://github.com/JuliaLang/julia/issues/25216#issuecomment-353567052

The following perfectly well constructed string with octal constants fails in Julia, because of this issue:

julia> String([0o150, 0o145, 0o154, 0o154, 0o157, 0o054, 0o040, 0o127, 0o167, 0o162, 0o154, 0o144, 0o041])
ERROR: MethodError: Cannot `convert` an object of type Array{UInt16,1} to an object of type String
This may have arisen from a call to the constructor String(...),
since type constructors fall back to convert methods.
Stacktrace:
 [1] String(::Array{UInt16,1}) at ./sysimg.jl:77

You need to explicitly force them to all be UInt8, as follows, to get the intended result:

julia> String(UInt8[0o150, 0o145, 0o154, 0o154, 0o157, 0o054, 0o040, 0o167, 0o157, 0o162, 0o154, 0o144, 0o041])
"hello, world!"

ScottPJones · December 22, 2017, 2:14pm

I realize that there’s not much call for using octal literals these days (and Swift even removed them, only the sequence \0 is allowed in string literals, any following digit is taken to be a separate character), but if they are going to be in the language, they should be done in as sane a way as possible. (They made a lot of sense back on the 18 and 36 bit machines that I used back at MIT in the early 80’s, i.e. Dec-10, Dec-20, and Lisp Machines, with 9-bit bytes, but not so much these days!)

klacru · December 22, 2017, 9:08pm

Interesting formula. It seems to give reasonable results for small numbers with few leading zeros.
How would I predict the number of required bytes in simple terms (how many leading zeros must I take to force an Int64)?
What is exactly the problem, which is solved?

I verified, that in any case if two octal representations of numbers have the same number of digits (with or without leading zeros), the smaller one never required more bytes that the bigger one.

I found 2 examples, where the formula is on byte too high:

julia> os(oct(0x200000))
4
julia> os(oct(0x200000000000000000000000000000))
16

klacru · December 23, 2017, 8:23am

I found a even simpler solution, which warrants to increase of the binary type for (octal) number literals of the same amount of digits:

In case of leading zeros, replace the first 0 by 1 and take the required size for the modified literal.

The current implementation behaved as if 0 had been replaced by the maximal digit of the base, which leads to different results for octals.
It was easy to put that rule into a PR #25259.

ScottPJones · December 23, 2017, 1:25pm

I think that it is more important to get a consistent solution, rather than the simplest.
Is your solution consistent with the way leading zeros work with hex constants or binary constants?

edit: I’m not saying that it isn’t, I just haven’t been able to try it yet - if so, very good!

klacru · December 23, 2017, 6:32pm

Yes, it is! With “simple” I mean elegant, concise, and adapted to the problem.
Have a look into the source code to see, that the same function is used for oct, bin and hex as well.

Btw, your example works, of course:

julia> String([0o150, 0o145, 0o154, 0o154, 0o157, 0o054, 0o040, 0o127, 0o167, 0o162, 0o154, 0o144, 0o041])
"hello, Wwrld!"

ScottPJones · December 23, 2017, 7:17pm

Oops! Typo there, that was supposed to be "hello, world!", of course!
My octal must be rusty!

Topic		Replies	Views
odd byte length primitive types and reinterpret() General Usage	3	938	February 13, 2018
I can not understand the meaning of this sentence on the doc New to Julia faq	4	1783	February 20, 2019
Why get 0 for 2^(3^4)? General Usage question	7	546	October 3, 2022
Julia C constant equivalent New to Julia	4	342	May 31, 2021
String indices : byte indexing feels wrong New to Julia strings , unicode	18	1400	December 5, 2023

Solution for issue #25216, larger octal literals produce smaller types, sometimes

Related topics