Broadcasting on Vector{SubString{String}} does not work in Julia 1.7.0 rc1?

pszufe · October 16, 2021, 8:38pm

I use Julia 1.7.0rc1 and it seems that broadcasting works for Vector{String} and does not work on Vector{SubString{String}}. Is this something intended or I run into a bug?

For Vector{String} all works as expected:

julia> length.(["1","22","333"])
3-element Vector{Int64}:
 1
 2
 3

For Vector{SubString{String}} Comprehension works:

julia> [length(a) for a in split.("1 22 333")]
3-element Vector{Int64}:
 1
 2
 3

However broadcasting does not:

julia> length.(split.("1 22 333"))
3

julia> parse.(Int,split.("1 22 333"))
ERROR: MethodError: no method matching parse(::Type{Int64}, ::Vector{SubString{String}})

Debuggin details:

julia> versioninfo()
Julia Version 1.7.0-rc1
Commit 9eade6195e (2021-09-12 06:45 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)

Sukera · October 16, 2021, 8:45pm

Very weird edge case, might be a bug.

Note though that you don’t need the . for split - Strings are treated as scalars for broadcasting purposes:

julia> parse.(Int, split("1 22 333"))
3-element Vector{Int64}:
   1
  22
 333

Sukera · October 16, 2021, 8:47pm

It also only seems to happen with loop fusion:

julia> t = split.(SubString("1 22 333"))
3-element Vector{SubString{String}}:
 "1"
 "22"
 "333"

julia> parse.(Int, t)
3-element Vector{Int64}:
   1
  22
 333

Yeah, seems like a bug to me - please open an issue at Issues · JuliaLang/julia · GitHub! Thank you for finding this.

pszufe · October 16, 2021, 9:00pm

Yes, however:

julia> split("1 22 333") ==  split.("1 22 333")
true

Yet the outcome of split.("1 22 333") mysteriously fails on broadcasting while to should not. Scalars in broadcasting should be treated as zero sized arrays, no?

pszufe · October 16, 2021, 9:01pm

Sukera it fails when broadcasting was made on a scalar. See my answer to other post.

bkamins · October 16, 2021, 9:13pm

Following the Julia Manual

Moreover, like all vectorized “dot calls,” these “dot operators” are fusing . For example, if you compute 2 .* A.^2 .+ sin.(A) (or equivalently @. 2A^2 + sin(A) , using the @. macro) for an array A , it performs a single loop over A , computing 2a^2 + sin(a) for each element of A .

Let us try to translate it for our case. You have "1 22 333" input. You iterate it. Strings have special handling in broadcasting so it is a single element loop (i.e. we do not iterate string), so it seems that just parse(Int, split("1 22 333")) should be done as you passed one element in the input. And this fails.

Am I missing something here?

pszufe · October 16, 2021, 9:22pm

Here are the outputs:

julia> parse.(Int, split("1 22 333"))
3-element Vector{Int64}:
   1
  22
 333

julia> parse.(Int, split.("1 22 333"))
ERROR: MethodError: no method matching parse(::Type{Int64}, ::Vector{SubString{String}})

It looks like the second parse gets the entire Vector for some reason? Which is not the expected behavior.

BTW @. fails in a similar way:

julia> @. parse(Int, split("1 22 333"))
ERROR: MethodError: no method matching parse(::Type{Int64}, ::Vector{SubString{String}})

bkamins · October 16, 2021, 9:30pm

What I say is that this is the expected behavior. Since you input a 0-dimensional argument then parse must be called only once on the output of the split (since 0-dimensional input contains exactly one element).

For example what would you expect to happen if you did parse.(Int, split.(Ref("1 22 333")))?

Note that for broadcasting "1 22 333" and Ref("1 22 333") is exactly the same since:

julia> Base.broadcastable("1 22 33")
Base.RefValue{String}("1 22 33")

bkamins · October 16, 2021, 9:42pm

Do you find this expected?

julia> string.(identity.(Ref([1,2,3])))
"[1, 2, 3]"

pszufe · October 16, 2021, 9:52pm

so I understand that whenever more than one dot operator is involved, the dimension for loop fusion is taken from whatever is executed as first?

jlapeyre · October 16, 2021, 10:05pm

It’s fused. There is only one iteration. The string is treated as a scalar for broadcasting, and it is the only thing being iterated over. So there is no choice to do something different.

bkamins · October 16, 2021, 10:07pm

Yes if you pass only one argument and just do function call nesting.

In general no, consider:

julia> .+([1 2], .^(2, [3, 4]))
2×2 Matrix{Int64}:
  9  10
 17  18

So all three arguments take part in determination of the dimension.

However, the dimensions of the output can be determined in a way independent from the functions involved. Only dimensionality of their arguments matters.

Here a special case is if you pass only 0-dimensional arguments, when - for convenience - the result of the operation is not wrapped in a 0-dimensional container, but is unwrapped when broadcasting is performed. And this is probably the reason of the confusion.

See e.g.

julia> sin.(fill(1))
0.8414709848078965

and you can see that 0-dimensional array was dropped.

This behavior is relevant if you wanted to implement custom broadcasting for your own types, as 0-dimensional containers require special treatment.

jlapeyre · October 16, 2021, 10:27pm

It might be helpful to compare this

parse.([Int, Float64], split.("1 2"))

which fails, with this

parse.([Int, Float64], Ref(["1", "2"]))

which fails for essentially the same reason.
The length of the first argument (for iteration) is 2, and the length of the second argument is 1, so it is reused.

Topic		Replies	Views
Parse vector from string General Usage strings , sparse	24	6426	March 27, 2023
How to integrate parse with split? New to Julia	3	91	October 30, 2024
Return of split New to Julia function	6	130	November 23, 2024
Vectorize replace with Vector of String raises DimensionMismatch General Usage	1	560	November 26, 2019
Converting SubString Array to Array of Floats without loops New to Julia	9	6043	July 13, 2018

Broadcasting on Vector{SubString{String}} does not work in Julia 1.7.0 rc1?

Related topics