Broadcasting on Vector{SubString{String}} does not work in Julia 1.7.0 rc1?

I use Julia 1.7.0rc1 and it seems that broadcasting works for Vector{String} and does not work on Vector{SubString{String}}. Is this something intended or I run into a bug?

For Vector{String} all works as expected:

julia> length.(["1","22","333"])
3-element Vector{Int64}:
 1
 2
 3

For Vector{SubString{String}} Comprehension works:

julia> [length(a) for a in split.("1 22 333")]
3-element Vector{Int64}:
 1
 2
 3

However broadcasting does not:

julia> length.(split.("1 22 333"))
3

julia> parse.(Int,split.("1 22 333"))
ERROR: MethodError: no method matching parse(::Type{Int64}, ::Vector{SubString{String}})

Debuggin details:

julia> versioninfo()
Julia Version 1.7.0-rc1
Commit 9eade6195e (2021-09-12 06:45 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
1 Like

Very weird edge case, might be a bug.

Note though that you don’t need the . for split - Strings are treated as scalars for broadcasting purposes:

julia> parse.(Int, split("1 22 333"))
3-element Vector{Int64}:
   1
  22
 333
1 Like

It also only seems to happen with loop fusion:

julia> t = split.(SubString("1 22 333"))
3-element Vector{SubString{String}}:
 "1"
 "22"
 "333"

julia> parse.(Int, t)
3-element Vector{Int64}:
   1
  22
 333

Yeah, seems like a bug to me - please open an issue at Issues · JuliaLang/julia · GitHub! Thank you for finding this.

1 Like

Yes, however:

julia> split("1 22 333") ==  split.("1 22 333")
true

Yet the outcome of split.("1 22 333") mysteriously fails on broadcasting while to should not. Scalars in broadcasting should be treated as zero sized arrays, no?

Sukera it fails when broadcasting was made on a scalar. See my answer to other post.

Following the Julia Manual

Moreover, like all vectorized “dot calls,” these “dot operators” are fusing . For example, if you compute 2 .* A.^2 .+ sin.(A) (or equivalently @. 2A^2 + sin(A) , using the @. macro) for an array A , it performs a single loop over A , computing 2a^2 + sin(a) for each element of A .

Let us try to translate it for our case. You have "1 22 333" input. You iterate it. Strings have special handling in broadcasting so it is a single element loop (i.e. we do not iterate string), so it seems that just parse(Int, split("1 22 333")) should be done as you passed one element in the input. And this fails.

Am I missing something here?

Here are the outputs:

julia> parse.(Int, split("1 22 333"))
3-element Vector{Int64}:
   1
  22
 333

julia> parse.(Int, split.("1 22 333"))
ERROR: MethodError: no method matching parse(::Type{Int64}, ::Vector{SubString{String}})

It looks like the second parse gets the entire Vector for some reason? Which is not the expected behavior.

BTW @. fails in a similar way:

julia> @. parse(Int, split("1 22 333"))
ERROR: MethodError: no method matching parse(::Type{Int64}, ::Vector{SubString{String}})

What I say is that this is the expected behavior. Since you input a 0-dimensional argument then parse must be called only once on the output of the split (since 0-dimensional input contains exactly one element).

For example what would you expect to happen if you did parse.(Int, split.(Ref("1 22 333")))?

Note that for broadcasting "1 22 333" and Ref("1 22 333") is exactly the same since:

julia> Base.broadcastable("1 22 33")
Base.RefValue{String}("1 22 33")

Do you find this expected?

julia> string.(identity.(Ref([1,2,3])))
"[1, 2, 3]"

so I understand that whenever more than one dot operator is involved, the dimension for loop fusion is taken from whatever is executed as first?

It’s fused. There is only one iteration. The string is treated as a scalar for broadcasting, and it is the only thing being iterated over. So there is no choice to do something different.

Yes if you pass only one argument and just do function call nesting.

In general no, consider:

julia> .+([1 2], .^(2, [3, 4]))
2Ă—2 Matrix{Int64}:
  9  10
 17  18

So all three arguments take part in determination of the dimension.

However, the dimensions of the output can be determined in a way independent from the functions involved. Only dimensionality of their arguments matters.

Here a special case is if you pass only 0-dimensional arguments, when - for convenience - the result of the operation is not wrapped in a 0-dimensional container, but is unwrapped when broadcasting is performed. And this is probably the reason of the confusion.

See e.g.

julia> sin.(fill(1))
0.8414709848078965

and you can see that 0-dimensional array was dropped.

This behavior is relevant if you wanted to implement custom broadcasting for your own types, as 0-dimensional containers require special treatment.

1 Like

It might be helpful to compare this

parse.([Int, Float64], split.("1 2"))

which fails, with this

parse.([Int, Float64], Ref(["1", "2"]))

which fails for essentially the same reason.
The length of the first argument (for iteration) is 2, and the length of the second argument is 1, so it is reused.

1 Like