Didn’t think I’ll push another significant update soon, but here it is (:
upd: DataPipes@0.2.1 is registered in General.
I perform essentially all data manipulation tasks with DataPipes
, and didn’t encounter many pain points with it. Still, there are a couple of common scenarios that can be made cleaner with less boilerplate. They mostly revolve around working with nested data, and now I addressed some of these scenarios.
A common pattern is lambda functions consisting only of inner “pipes” (@p
), especially with the map
function. Like this simple parsing of a key-value string into a named tuple:
@p begin
"a=1 b=2 c=3"
split
map() do kv
@p begin
split(kv, '=')
Symbol(__[1]) => parse(Int, __[2])
end
end
NamedTuple
end
Now, it has a more succinct syntax in DataPipes
: the lambda function body is automatically wrapped with an inner pipe when the only argument is __
(double underscore). The intuition is that __
refers to the previous pipeline step in DataPipes
, and by assigning to __
we effectively start a new pipe.
Here is the same example using the new feature:
@p begin
"a=1 b=2 c=3"
split
map() do __
split(__, '=')
Symbol(__[1]) => parse(Int, __[2])
end
NamedTuple
end
Essentially, we got rid of one nesting level and the @p begin end
boilerplate.
Idea that such nesting can be simplified is taken from a post on Julia slack. Unfortunately, cannot find that post anymore.