[ANN] DataPipes.jl 0.3.0

Didn’t think I’ll push another significant update soon, but here it is (:
upd: DataPipes@0.2.1 is registered in General.

I perform essentially all data manipulation tasks with DataPipes, and didn’t encounter many pain points with it. Still, there are a couple of common scenarios that can be made cleaner with less boilerplate. They mostly revolve around working with nested data, and now I addressed some of these scenarios.

A common pattern is lambda functions consisting only of inner “pipes” (@p), especially with the map function. Like this simple parsing of a key-value string into a named tuple:

@p begin
	"a=1 b=2 c=3"

	split
	map() do kv
		@p begin
			split(kv, '=')
			Symbol(__[1]) => parse(Int, __[2])
		end
	end
	NamedTuple
end

Now, it has a more succinct syntax in DataPipes: the lambda function body is automatically wrapped with an inner pipe when the only argument is __ (double underscore). The intuition is that __ refers to the previous pipeline step in DataPipes, and by assigning to __ we effectively start a new pipe.
Here is the same example using the new feature:

@p begin
	"a=1 b=2 c=3"

	split
	map() do __
		split(__, '=')
		Symbol(__[1]) => parse(Int, __[2])
	end
	NamedTuple
end

Essentially, we got rid of one nesting level and the @p begin end boilerplate.

Idea that such nesting can be simplified is taken from a post on Julia slack. Unfortunately, cannot find that post anymore.

4 Likes