Problems with deprecations of islower, lowercase, isupper, uppercase

ScottPJones · December 23, 2017, 6:32pm

Please stop misrepresenting things totally. Many people here may not be aware of the facts of the situation
(for which I have ample evidence).

I didn’t change anything AT ALL in the way strings were handled prior to my starting to contribute to Julia back in April 2015, except for fixing (some) of the many bugs, and greatly improving the performance of conversions.

In v0.3.x, you had ASCIIString, UTF8String, UTF16String, and UTF32String.
See the following definition: https://github.com/JuliaLang/julia/blob/release-0.3/base/utf8.jl#L163, i.e.

convert(::Type{UTF8String}, a::Array{Uint8,1}) = is_valid_utf8(a) ? UTF8String(a) : error("invalid UTF-8 sequence")

The philosophy then was that if you converted something to a UTF8String, it was checked for validity.
I did not change that one bit.
I did fix bugs: such as #10919 (my very first Julia PR), also found a very serious problem in #10958, in my first few weeks after I first saw Julia.

@stevengj said at the time, about #10958:

Whether we should accept (and silently convert) modified UTF-8 to standard UTF-8 is a separate issue; I tend to agree, but let’s keep that out of this discussion. After reading the RFCs, I agree that we shouldn’t produce the overlong NUL encoding ourselves

which Jeff also agreed with.

Also: Steven brought up the following back then, which may still be a problem:

Some of the functions in utf8.c seem to assume valid UTF-8, which may not be produced e.g. by bytestring(ptr, len).

Other string related things I fixed that were included in the v0.4 release:

github.com/JuliaLang/julia

Fix several bugs in reverse(::UTF8String), add full coverage tests

JuliaLang:master ← ScottPJones:spj/u8reverse

opened 02:42PM - 16 Aug 15 UTC

ScottPJones

+49 -6

`reverse` on a `UTF8String` used the C function `u8_reverse`, which I discovered… in testing has several bugs. 1. It doesn't detect running off the end of the string when there is a char > 0x80 2. It picks up garbage bytes depending on the lead character 3. It is not portable to any machine that requires alignment. I have rewritten it in Julia, and added tests that fully cover the function. I wanted to remove `u8_reverse` from `src/support/utf8.c`, however that function is used by `flisp` for the `string.reverse` function, even though that function is apparently never used anywhere in any of the .scm code I have found in Base. I wonder if the unused string functions in flisp, that are depending on broken C code, can simply be removed and save some space.

github.com/JuliaLang/julia

Remove string.reverse from flisp and u8_reverse from utf8.c

JuliaLang:master ← ScottPJones:spj/remu8reverse

opened 03:09PM - 21 Aug 15 UTC

ScottPJones

+0 -79

The flisp `string.reverse` does not appear to be used anywhere (at least, not in… JuliaLang/julia), and depends on the function `u8_reverse` that has the potential of access violations if there is invalid data at the end of a string. Removing it will eliminate the problem, and save a small amount of space.

github.com/JuliaLang/julia

Deprecate getindex/checkbounds methods for non Integer Real indices for Chars and Strings

JuliaLang:master ← ScottPJones:spj/deprecateindexreal

opened 07:11PM - 01 Sep 15 UTC

ScottPJones

+6 -3

These functions depended on a version of `to_index`, which has been deprecated. …I tried to add tests for these methods, because they showed up as not being covered, however I was told not to, because they give a deprecation warning. This now gives a better error to the user, giving a work-around, and also giving the method that they called that doesn't work any longer, and eliminates the coverage holes in `strings/basic.jl` and `char.jl`

and added a lot of unit tests (char and string functions had been very poorly covered previously):

github.com/JuliaLang/julia

Add extra coverage testing for char.jl

JuliaLang:master ← ScottPJones:spj/testchar

opened 12:30PM - 31 Aug 15 UTC

ScottPJones

+15 -9

Add tests for getindex, bswap, ndims, size and typemin Note: I noticed a number …of inconsistencies that should probably be dealt with in a post-0.4 PR. `getindex('c',1,1,1)` is allowed, and returns `'c'`, but `getindex("c",1,1,1)` gets an error. `bswap` on a `Char` should probably not be allowed, the operation only makes sense on the underlying codeunit, i.e. `UInt32`, not on `Char`.

Topic		Replies	Views
Isupper is not working New to Julia question , package	3	174	May 9, 2024
Deprecation => error? Internals & Design	10	1663	December 30, 2017
Function naming (clash, convention) New to Julia	3	474	January 21, 2017
String Handling Functions Performance functions	7	220	February 23, 2023
Base conversion (hex, bin, octal...) General Usage	15	8567	October 11, 2018

Related topics