Pandas.series.str.extract equivalent?

magrathean · December 27, 2017, 4:35am

How do I do the following pandas manipulation on a Julia Dataframe column:

>>> s = Series(['a1', 'b2', 'c3'])
>>> s.str.extract('(?P<letter>[ab])(?P<digit>\d)')
  letter digit
0      a     1
1      b     2
2    NaN   NaN

vchuravy · December 27, 2017, 2:19pm

I am not aware of an inbuilt method, but to get you started.
Julia support regexes and so the first parts would be:

data = ["a1", "b2", "c3"]
regex =  r"(?P<letter>[ab])(?P<digit>\d)"
result = match.(regex, data) # notice the "dot" after match
3-element Array{Any,1}:
 RegexMatch("a1", letter="a", digit="1")
 RegexMatch("b2", letter="b", digit="2")
 nothing

So now you have to convert the result array into a DataFrame…
Which is a bit annoying and the below is not the most efficient way of doing it.

getnames(m::RegexMatch) = collect(values(Base,PCRE.capture_names(m.regex.regex)))
getnames(m::Void) = String[]

columns = Symbol.(unique(reduce(append!, getnames.(result))))
df = DataFrame(fill(String, length(columns)), columns, 0)
for rm in result
    if rm === nothing
        push!(df, fill(NA, length(colums))
        continue
    end
    row = Any[]
    for column in columns
        if column ∉ getnames(rm)
            push!(row, rm[column])
        else
            push!(row, NA)
        end
    end
    push!(df, row)
end

magrathean · December 27, 2017, 5:18pm

@vchuravy looks rather tedious; thanks for the details nevertheless.

Topic		Replies	Views
How would I write this Pandas column filter code in Julia? General Usage	5	1141	February 8, 2020
Broadcasting Regex in DataFrame to create new column New to Julia question , regex , dataframes , broadcasting	3	1676	May 24, 2020
Stripping or replacing substrings into a vector of strings New to Julia strings , dataframes	9	375	February 28, 2024
Extracting Ints from a String in a DataFrame General Usage dataframes	4	538	March 2, 2021
How to extract substring of a Julia Dataframe column General Usage	3	1784	June 17, 2020

Pandas.series.str.extract equivalent?

Related topics