Converting a Array of strings to an Array of Char

I have an input similar like this:

a = ["123"; "456"; "abc"]
display(a)

3-element Array{String,1}:
 "123"
 "456"
 "abc"

I want to convert it to something like this:

3×3 Array{Char,2}:
 '1'  '2'  '3'
 '4'  '5'  '6'
 'a'  'b'  'c'

I have tried with split(), Vector() and a few other funny ideas, but I always get much more complicated Arrays. Is there an easy and computational unexpensive way to do that?
I appreciate any hint.
Thanks

1 Like

Here’s a reasonably efficient approach:

julia> reduce(vcat, permutedims.(collect.(a)))
3×3 Array{Char,2}:
 '1'  '2'  '3'
 '4'  '5'  '6'
 'a'  'b'  'c'

Explanation:

  • collect takes in a string and gives us a vector of Chars. Calling collect.(a) applies collect to each element:
julia> collect.(a)
3-element Array{Array{Char,1},1}:
 ['1', '2', '3']
 ['4', '5', '6']
 ['a', 'b', 'c']
  • Those vectors are 1-D, and thus nominally column vectors. We want rows, so we permute the dimensions of each vector to make it into a row vector.
julia> permutedims.(collect.(a))
3-element Array{Array{Char,2},1}:
 ['1' '2' '3']
 ['4' '5' '6']
 ['a' 'b' 'c']
  • Finally, we collect the result by combining each row with vcat
julia> reduce(vcat, permutedims.(collect.(a)))
3×3 Array{Char,2}:
 '1'  '2'  '3'
 '4'  '5'  '6'
 'a'  'b'  'c'

This isn’t the most efficient possible approach because it allocates a new vector to hold the characters in each string, then throws that vector away after building the matrix. We can save some memory allocation by doing the permutedims and collect steps lazily:

julia> reduce(vcat, (permutedims(collect(s)) for s in a))
3×3 Array{Char,2}:
 '1'  '2'  '3'
 '4'  '5'  '6'
 'a'  'b'  'c'

If you’re still concerned about performance, use BenchmarkTools.jl to measure it for your application.

5 Likes

Note that apart from the “nesting multiple functions approach” you can also always just write a bunch of simple loops:

function func(a)
    n = length(a[1])
    A = Matrix{Char}(undef,n,n) # preallocating the result
    for i in 1:length(a) # looping over all strings
        for (j, c) in enumerate(a[i]) # looping over all chars in a string
            A[i, j] = c
        end
    end
    return A
end

Since it’s Julia, it will be fast. On my machine I get

julia> using BenchmarkTools

julia> @btime func($a);
  45.804 ns (1 allocation: 128 bytes)

julia> @btime reduce(vcat, permutedims.(collect.($a)));
  322.870 ns (11 allocations: 816 bytes)
7 Likes

Nice explanations.

It’s a pity that you can’t write [c for c in v, v in a] or perhaps [c for c in a[i], i in eachindex(a)]. Because these go to Iterators.product, and it needs to know all the ranges before starting.

You can write this

[a[i][j] for i=1:length(first(a)), j=1:length(a)]

but it won’t behave well with non-ascii strings, like

a = ["123"; "456"; "ábc"]
1 Like

Thank you to you all,
I have now plenty of new ideas to continue working on my small problem.
Special thanks to @rdeits for the detailed explanations.

1 Like