How to "de-vectorize" a vector of string


#1

Most likely it is something very easy I am missing but searched carefully the manual and couldn’t find a function to “de-vectorize” a vector of strings

Starting let’s say with

a = ["X", "Y", "Z"]
b = [a]

now b == [["X", "Y", "Z"]]. I would like to find a function f(.) such that

b[f(a)] = a

or f(["X", "Y", "Z"]) = "X", "Y", "Z"

Thank you for your help


#2

I’m not entirely sure what you mean by

b[f(a)] = a

that would mean indexing into the vector b by the result of some function f(a), which is possible but probably not what you want.

Can you provide a little more context about what you’re actually trying to accomplish?


#3

These are two different operations, which one do you want?

The second one is just splatting f(a) = (a...) should do it, though the function would not be type stable and it is not something you should do a lot if you care about performance.

b[1] will be equal to a so f(a) = 1 works for the first one. Not sure how do you want it to depend on a


#4

Actually I guess you mean [f(a)] instead of b[f(a)]? That’ll match slightly better with the other version. If this is true, then no it’s impossible, [f(a)] will always return a single element array, no exceptions (unless you break the internal implementation since this behavior is implemented in julia after all), so you cannot make it equal to a multiple element array. [a...] would be the closest to what you are asking for but is strongly recomented against if you want to do this many times. [a;] is also equivalent in this case and should be much faster. Of course neither of them are useful since these are just fancy/inefficient ways to spell copy(a). You need to be more specific about your actual problem.


#5

Do you want things like

b = [["X", "Y", "Z"]]
first(b) #["X","Y","Z"]

or

(first(b)...) # "X","Y","Z"

Not sure what you’re asking


#6

Thank you so much. The support from Julia community is amazing!
I’ll try to be more precise by giving an example:

ta is a TimeArray where ta.colnames is string type
a = ["column10", "column50"]
ta[a]

Some results I got are:

ta1 = ta[a] # MethodError
ta2 = ta[(a...)] # MethodError
ta3 = ta[a...] # OK
ta4 = ta[a;] # MethodError
ta5 = ta[copy(a)] # MethodError

So a... at least works but, typical of me, it is strongly recommended against. Would this be the only option?


#7
new_ta = ta[first(a)]

?


#8

It almost works, but we only get the first "column10" of the two columns in a


#9

Oh, now you don’t have a Vector of Vectors?

I wouldn’t care about the splatting penalty here. It would give performance problems if you were doing really fast operations in a loop, but here you’re just pulling out a column. Splat and call it a day. Of you need something more, come back to this later and just loop.


#10

Thank you both very much. Indeed this is the case. I’ll be splatting around just a few times. But although I can cope with the lack of elegance, the lack of performance is conceptually somewhat annoying.


#11

Could your vector a perhaps be replaced with a tuple instead? I’m not sure of all the details, but I think splatting with tuples is likely to be more efficient because their size is known at compile-time. Here’s a really simple example:

julia> f(x) = +(x...)
f (generic function with 1 method)

julia> using BenchmarkTools

julia> y1 = [1, 2]
2-element Array{Int64,1}:
 1
 2

julia> y2 = (1, 2)
(1,2)

julia> @benchmark f($y1)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     61.206 ns (0.00% GC)
  median time:      66.964 ns (0.00% GC)
  mean time:        68.496 ns (0.00% GC)
  maximum time:     428.580 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     980
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> @benchmark f($y2)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.850 ns (0.00% GC)
  median time:      1.856 ns (0.00% GC)
  mean time:        1.875 ns (0.00% GC)
  maximum time:     14.586 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%

#12

@rdeits thank you very much. It worked like a charm. Also got a >30x speed improvement! Julia is indeed a very rewarding language.