"Domain-Specific Languages" in Julia


#1

When I read or hear about domain-specific languages in Julia I tend to cringe a little.
In most cases it is not about a language, but rather an API.

The difference? A DSL uses a syntax appropriate for its domain. An API is restricted by the syntax rules of the hosting language, in this case Julia. For example,

DSL

SELECT * FROM table1 WHERE age>50

API

# QueryVerse
    @from i in table1
    @where i.age>50
    @select i

# JuliaDB
    filter( p->p.age > 50 , table1)

I don’t mean to pick on QueryVerse or JuliaDB specifically. There are many other examples in other domains. What they all have in common is that they are restricted to either Julia macros and/or Julia functions, both of which must adhere to Julia syntax rules.

In my opinion each Domain-Specific Language needs its own parser. The only example I know of in Julia are regular expressions. Those are parsed by a regular expression parser.

Why not do something similar with other DSL.

SQL"""
SELECT *
  FROM table1
  WHERE age > 50
"""
GAMS"""
Parameter c(i,j)  transport cost in thousands of dollars per case ;
           c(i,j) = f * d(i,j) / 1000 ;
 Variables
      x(i,j)  shipment quantities in cases
      z       total transportation costs in thousands of dollars ;
 Positive Variable x ;
 Equations
      cost        define objective function
      supply(i)   observe supply limit at plant i
      demand(j)   satisfy demand at market j ;
 cost ..        z  =e=  sum((i,j), c(i,j)*x(i,j)) ;
 supply(i) ..   sum(j, x(i,j))  =l=  a(i) ;
 demand(j) ..   sum(i, x(i,j))  =g=  b(j) ;
 Model transport /all/ ;
 Solve transport using lp minimizing z ;
 Display x.l, x.m ;
"""
XPath"""
// All <author> elements where the first <last-name> child element has the value Bob.
author[last-name [position()=1]= "Bob"]
"""

A general Julia parser-generator that can convert a true DSL to corresponding Julia expressions might be helpful here (For inspiration see ANTLR4)

Let the discussion begin :slight_smile:


#2

Parser generators do exist in the package community, See packages like

That’s an example of a Julia parser generator that translates to and from Julia expressions from another lang


#3

For reference, there’s been some relevant discussion in a previous thread: Seamlessly extending the Julia syntax.


#4

Following the theme of your DB examples, this Julia package seems to get very close to the syntax you want:


#5

Do you mean we should embrace the two language issue instead of solving it…


#6

The two language issue refers to having to use two different generic programming languages to properly balance ease of use and performance. In this case we’re talking about adding Domain Specific Languages to a generic programming language. Regular expressions are good example of a DSL embedded in a generic language such as Julia.

In my opinion packages like Octo.jl and Query.jl don’t go far enough. They both borrow from SQL, but can’t use the exact SQL syntax. As a result we have to learn multiple language dialects and remember how they differ from the SQL language standard.


#7

The idea of creating a language-specific API with the intention of imitating some imagined “master dialect” of SQL makes me feel kind of dirty. I don’t think the fact that SQL is a horrorshow as something that can or should be solved by any language-specific API, it would make much more sense to me to take a language-agnostic translation driver that transcompiles SQL code like what is done with some flavors of JavaScript. While this could be implemented in Julia, it seems somehow separate from a regular Julia SQL API. For the latter, I’d much rather some sort of lazy table abstraction like a Spark dataframe, something like that seems more worthy of people’s efforts than a SQL-lookalike, considering that the programming paradigms within SQL are, well, seemingly a bizarre relic of a time when COBOL was all the rage.

I of course realize that many will disagree with this stance, and those people are certainly doing nice work, Octo is pretty cool, and it’s wild to see how far the Julia syntax can be pushed.

I certainly agree with you that the proliferation of many distinct SQL dialects is a problem in need of solving.


#8

Hello,
From linked discussion by waldyrious it seems that the way to go on making own DSL is to create own REPL. On the other hand I read in https://github.com/JuliaLang/julia/pull/24945#issuecomment-349739529 that “REPL code will be considered non-public and unstable in 1.0”.
Is there any canonical document how to deal with making own DSL considering the above? any examples are welcome.


#9

I have a package ReplMaker.jl which exposes a simpler API for building repl modes. It’s currently very basic and limited but for now, it’s still a nice option if one doesn’t want to deal with the difficult and confusing process of interfacing with the REPL standard library directly.

If anyone out there wants to collaborate on making ReplMaker.jl a bit more robust and versatile, I’d be very interested!


Note that one does not necessarily need a separate repl mode for a DSL. It’s just a nice thing to have if you want a clear separation between interacting with your DSL and standard Julia or if you want to use semantics which are incompatible with Julia’s parser.


#10

Thanks Mason for your input and ReplMaker.jl. I’ve spotted your package from linked discussion. It is really interesting. It is so small that it is actually more robust (in terms of reducing deps) to include same logic in a package that would like to use ReplMaker. REPL standard library seems to be going non-public, thus your package gets some extra rationale by handling interfacing to non-public API which is likely to change over time. On the other hand, still it is so small. My proposal would be to submit your ReplMaker as PR to REPL std lib, so it can be used as public API for interfacing with REPL. Isn’t that the ideal solution for everyone?
Being able to build true DSL is important feature of Julia, but making REPL std lib a non-public is an obstacle for doing that.

As for doing DSL without separate REPL, AFAIU (I actually understand very few as I am newcomer to Julia) you have to use macros, which are constrained by Julia macro definition, is that correct?


#11

There are non standard string literals, which are not constrained to use julia syntax.


#12

I think this distinction comes from an era when DSLs expressions were not only parsed, but compiled before applied in the host environment. SQL is a very relevant example.

Julia is unique in having a very smart compiler, an expressive parametric type system and multiple dispatch. This allows the design of APIs which compose of small pieces, all of them valid Julia expressions in themselves, yet achieve a lot of expressive power at little or zero overhead.

As a simple example, consider

DataFrames.by(iris, :Species, :PetalLength => mean)

where :PetalLength => mean is nothing special syntactically, it is just a Pair{Symbol,typeof(mean)}. Yet it is used as a building block for the semantics of by.

As much as I admire the elegant DSLs that people think up, I think that composing an API out of small pieces of native Julia constructs that mesh together well in a natural way is also a viable route. Whether to call these DSLs is a matter of terminology.


#13

You also don’t need a REPL to handle syntax incompatible with the Julia parser. For example, you can parse your own programming language by writing a DSL parser. The REPL is only for interactive use, but is useless from the viewpoint of package creation. From the perspective of programming packages using the DSL with non-standard syntax, all that you need is the parse method since you can’t really call the REPL from within a package code anyway. The REPL is only used in Main when calling Julia from terminals.

Once you have the parse method, you can use that to parse it as string literals

So REPL is probably the last step to create when making a DSL, but it can be a nice cherry on top. Before that you need to have written a parser and created a DSL or API in the first place.

For example, in Reduce.jl the only thing the REPL code does is read code and display the RExpr objects, while the parser for the language is entirely separate from the REPL code. When you use ForceImport.jl with the command @force using Reduce.Algebra, then the DSL is imported into the local namespace and now the Julia language is entirely extended to operate on expressions. This REPL is not part of DSL, since the domain specific language from the upstream REDUCE language can now be used without the REPL. The REPL mode is a mere convenience used for interacting, but the R"..." string literal is used mainly instead, in addition to being able to translate Julia expressions into the other language back and forth.


#14

Somewhat related to nonstandard string literals, a really nice API for DSLs would be like via some BNF variant. That is, I’d like

mime_format = BNF"""
boundary := 0*69<bchars> bcharsnospace

   bchars := bcharsnospace / " "

   bcharsnospace :=    DIGIT / ALPHA / "'" / "(" / ")" / "+"  / "_"
                  / "," / "-" / "." / "/" / ":" / "=" / "?"

   charset := "us-ascii" / "iso-8859-1" / "iso-8859-2"/ "iso-8859-3"
        / "iso-8859-4" / "iso-8859-5" /  "iso-8859-6" / "iso-8859-7"
        / "iso-8859-8" / "iso-8859-9" / extension-token
        ; case insensitive

   close-delimiter := "--" boundary "--" CRLF;Again,no space by "--",

   content  := "Content-Type"  ":" type "/" subtype  *(";" parameter)
             ; case-insensitive matching of type and subtype

   delimiter := "--" boundary CRLF  ;taken from Content-Type field.
                                ; There must be no space
                                ; between "--" and boundary.
...
"""
mime_parser = mod_generate(mime_format) # maybe generates a module?
const MIMEMAIL_str = mime_parser.string_macro
some_mail = MIMEMAIL"""
      From: Whomever
      To: Someone
      Subject: whatever
      MIME-Version: 1.0
      Message-ID: <id1@host.com>
      Content-Type: multipart/alternative; boundary=42
      Content-ID: <id001@guppylake.bellcore.com>

      --42
      Content-Type: message/external-body;
           name="BodyFormats.ps";
           site="thumper.bellcore.com";
...
"""
some_mail.from #"Whomever"

That is, just specify the DSL in some BNF dialect; if you’re lucky then you’ll find that somebody already wrote BNF-specs. From that, generate data structures, parser, generator, macros for literals. Of course, one still needs to define what to actually do with the objects, but some package could do all the boilerplate.

Examples truncated from rfc1521. Do we have anything like that? I.e. a DSL for generating DSLs? Regex is a nice DSL for generating regular DSLs, but doesn’t cut the cake for context-free ones; BNF is no fun for very complex languages, but nice and human-readable at some point in the middle.


#16

A similar complaint comes up now and again in the Haskell community, where they often use the term EDSL (the E is for “embedded”). On hearing one described, one might complain, “But technically that’s just Haskell!”

In Haskell, EDSLs originated with clever use of monads, which allow for surprisingly concise and expressive code. The first time you see one of these, it doesn’t look like “normal” Haskell. It feels like a new language, built specifically for the purpose at hand.

Macros lead to a similar situation in Julia. In both cases, anyone looking behind the curtain will easily see that it’s “just code”. But the distinction is still useful as a way of thinking about code. “I’m just writing Julia” is a different mindset than “I’m writing a Julia DSL”. In the latter, we’re at least a little more willing to break the rules, while also usually targeting components that work especially well together but may not naturally extend outside the problem scope. In your example,

imagine using @select i in some other context. You could do it, but it would feel at least a little weird.

You point to parsing as the defining characteristic of a DSL. But this puts the focus squarely on syntax, while many would argue that semantics are more substantive, and that it’s different semantics that make a different language.

And there are lots of fuzzy cases. I’m writing a library for probabilistic programming, where a model is represented as an AST. It’s mostly standard Julia, but has ~ and operators that are not defined until an inference method is specified. At this point there are some compiler steps to arrive at valid Julia. So… is this a DSL?

I don’t know. Depends who you ask, I guess. But does it really matter? At the end of the day, it’s about getting the ideas across effectively.

Maybe the significant thing about all of this is that it’s possible at all. Compare with the code of a few decades ago: C always looks like C, Fortran like Fortran. Hmm, Lisp has macros, but it’s hard to mistake those parentheses for anything else. But then I’m talking syntax, which I had argued against. It’s all slippery :slight_smile:


#17

FWIW, I once spent several months pushing that idea past its breaking point and what happens is that you hit all kinds of undocumented compiler heuristics. Achieving good performance was a constant battle. Is my parametric type too deep? Will that constant be propagated or not? Etc. It’s not dependable. In contrast, macros can be very hairy to write, but at least I have full control over the outcome.


#18

Let’s look at an example, since we are debating on the definition of DSL. Perhaps someone wiser than me could debate me on whether this is considered a DSL or API or other language phenomenon.

This is still a work in progress, but here is an example of what I would describe as a DSL for julia. While it was all written in Julia, it does provide an extension of the language, for example the Complex{Bool} value represented by im can be augmented by additional multi-basis vectors e1,e2,...,e12,...,e1..n with 2^n of these basis elements with the Grassmann package, which are of Basis{N,G} type.

The signature of the basis elements can be provided with a string literal S"+++" for example

struct Signature{N}
    b::BitArray{1}
end
Signature{N}(s::String) where N = Signature{N}(push!([k=='-' for k∈s],s[1]=='ϵ',s[end]=='o'))
Signature(s::String) = Signature{length(s)}(s)
macro S_str(str)
    Signature(str)
end
sig(s::Bool) = s ? '-' : '+'
function show(io::IO,s::Signature{N}) where N
    print(io,s.b[end-1] ? 'ϵ' : sig(s.b[1]),sig.(s.b[2:N-1])...,s.b[end] ? 'o' : sig(s.b[N]))
end

This is a very simple definition, but it encodes the DSL needed for specifying the entire basis, which can be instantiated all at once by specifying the Signature{N} generated by the S"..." syntax (is it DSL?)

julia> G3 = Grassmann.Algebra(S"+++")
Grassmann.Algebra{3,8}(+++, e, e₁, e₂, e₃, e₁₂, e₁₃, e₂₃, e₁₂₃)

Alternatively, the @basis e s "+++" macro would assign the variables in the local workspace instead.

The macro helps generate the domain specific language of an algebra specified by the string constructor.

So, I like to think that with Julia you can write a DSL for constructing a DSL, but that’s up to your definitions.


#19

Here is an example of a domain specific language.

Suppose you wanted to implement Leverrier’s Algorithm using a Grassmann Algebra DSL.
Such a DSL should allow you to write the following function:

Λⁿ⁻¹Aⁿ⁻ᵏ = ΛⁿAⁿ⁻ᵏ.Id - A ∘ Λⁿ⁻¹Aⁿ⁻ᵏ⁻¹   # last line in the wikipedia example

The Julia parser wouldn’t be able to handle this. To make progress you could implement an API:

cp_coefficient(Λ, A, n, k) = ...

or you could implement a non-standard string macro that converts the following DSL to the above API.

GrassmannAlgebra"""
Λⁿ⁻¹Aⁿ⁻ᵏ = ΛⁿAⁿ⁻ᵏ.Id - A ∘ Λⁿ⁻¹Aⁿ⁻ᵏ⁻¹
"""

#20

Defining a DSL based on whether or not it’s semantics can be parsed by Julia’s default parser seems like a pretty arbritray distinction. Sure, you can define it that way but you seem to be acting like that is the only way to define a DSL, which I’d call mistaken.


#21

Suppose one writes three programs for the same domain-specific ecosystem (for example biology). They are written in Julia, Python and Java respectively. The three programs provide the exact same functionality. All user defined names (function names, type names, parameter names, variable names, etc.) and user defined symbols have the same meaning in each of the three programs.

The only difference between the programs is the surface syntax that allows the program to be parsed by a Julia parser, Python parser and Java parser respectively. For the sake of this argument we are going to ignore this programming language specific surface syntax.

How would you characterize this situation?

  1. Three programs written in (different dialects of) a single Domain Specific Language (let’s give it the name “BioSequences”).

Or

  1. Three programs that provide the same functionality, but each written using a different generic programming language.

Or to ask the same question in a different way; How would you characterize the program written in the Julia dialect?

  1. A program written in “BioSequences”.

Or

  1. A program written in “Julia”.