Extracting hashtags from text: Flattening in Query.jl

I don’t use Queryverse, but I guess you could put the flatten call in there by making another anonymous function out of it like |> x -> flatten(x, :tags).

With regard to the @mutate calls, you can get your matches directly with [x.match for x in eachmatch(r"#\p{L}+", _.text)], instead of collecting first and then extracting. If you come from R you might not know this list comprehension syntax yet.

Finally, this is how I would write this, just plain DataFrames.jl plus another helper package called Chain.jl, which allows to use any function in the pipe without making anonymous functions first (no matter whether the piped thing is the first, second, etc. argument), and without needing the |> symbol.

using DataFrames
using Chain


df = DataFrame(text = ["This is the #best #thing #ever", "I #love #Julia"])

get_tags(s) = [x.match for x in eachmatch(r"#\p{L}+", s)]

@chain df begin
    transform(:text => ByRow(get_tags) => :tags)
    flatten(:tags)
end
1 Like