Splitting Values in a Column and Only Taking the Second Value

Hi All,

I am having a tough time figuring out how to split a column and only use the second value from the split in R I would have used gsub:

maindf$pcc = gsub(".*-","", maindf$pcc)

Which is just splitting something like 1P-XXX into “1P” and “XXX” and then taking only the “XXX” and replacing the column with that value.

With Julia I found Strings.jl:

maindf = @chain rawdf begin
 @transform(:pcc = split.(:pcc, "-"))
end

Which produces an element Vector{Vector{SubString{String}}} with [“1P”, “XXX”]
what do I need to add to split and then only take the the “XXX” part of this?

The difficulty is that you’re operating on a whole column with split.(:pcc, "-"). To get the second element of each value you can do getindex.(split.(:pcc, "-"), 2). Conceptually you want to do split.(:pcc, "-").[2] but broadcasting the indexing operation like this with .[] is not supported.

It’s easier to operate row by row:

@transform(@byrow :pcc = split(:pcc, "-")[2])

or even simpler:

@rtransform(:pcc = split(:pcc, "-")[2])
3 Likes