“Write You A Query Language" —— To introduce the power of MLStyle.jl

announcement
data
dataframes

#1

Few days ago I get some interests on Query.jl, and I found it incredible exciting to implement it on my own and finally I did.
I’m here to present this article, Write You A Query Langauge, where I made a tutorial for MLStyle.jl to process complex macros gracefully and brought about a distinct prototype and some ideas for query languages in Julia.
I don’t mean to announce a new package about query language, although my prototype might be quite clean, efficient, robust and provided with more functionalities and better extensibilities, this is mainly aimed at telling you about some super useful tools for AST manipulations.

The main advantages of this prototype(called MQuery, tentatively) are

  • Support field names that’re not regular identifiers, e.g.,
    @select _."middle school".

  • No NamedTuple usage when executing computations, it might lead to better efficiency when the scale becomes extreme large.

  • Unlike providing some fixed syntactic supports like startswith, endswith, occursin in Query.jl, MQuery allows arbitrary fieldname filter functions, e.g.,

    _.(predicate1(a, b, c), !predicate2(d, e)) means that a selected field field must satisfy predicate1(field, a, b, c) && !predicate2(field, d, e).

  • Support to extend query commands like @select, @where, and so on via registered_ops.

  • Support closure reference in arbitray query expression.

  • Allow you to add custom query support for your own datatypes via 3 interface functions: get_fields, get_records, build_result. Codes below implement the query for DataFrames:

    get_fields(df :: DataFrame) = names(df)
    get_records(df :: DataFrame) = zip(DataFrames.columns(df)...)
    function build_result(::Type{DataFrame}, fields, typs, source :: Base.Generator)
        res = Tuple(typ[] for typ in typs)
        for each in source
            push!.(res, each)
        end
        DataFrame(collect(res), fields)
    end
    
  • The code base is pretty small, and MQuery just depends on MLStyle.jl(depends on nothing) and DataStructures.jl. So MQuery’s friendly for you to integrate with.

The reason why MQuery could be that powerful is, MLStyle.jl could help with all sorts of AST manipulations and make them become extremely easy, as a result, I can focus myself on other aspects like functionalities and extensibilities.

include("MQuery/MQuery.jl")
using Base.Enums
@enum TypeChecking Dynamic Static
df = DataFrame(
        Symbol("Type checking") => [
            Dynamic, Static, Static, Dynamic, Static, Dynamic, Dynamic, Static
        ],
        :name => [
            "Julia", "C#", "F#", "Ruby", "Java", "JavaScript", "Python", "Haskell"
        ],
        :year => [
            2012, 2000, 2005, 1995, 1995, 1995, 1990, 1990
        ]
)

df |>
@where !startswith(_.name, "Java"),
@groupby _."Type checking" => TC,
@having TC === Dynamic,
@select join(_.name, " and ") => result

│ Row │ result                    │
│     │ String                    │
├─────┼───────────────────────────┤
│ 1   │ Julia and Ruby and Python │

#2

Anyone has the tasks about AST manipulations can ask me for the solution to amazingly simplify(basically, could make the logics more clear) your codes.
Why not have a try? :slight_smile: