So, I don’t want to curb your enthusiasm too much, but let me give you a very short outline of fuzzing theory after AFL (american fuzzy lop) revolutionized the field ca 2014. As a disclaimer, I’m out of the industry for some years, so I’m not up to current state of the art, nor do I know a lot about libfuzzer.
The general problem that fuzzers try to solve is: Given a program P, explore what the program can do when fed with different inputs. Take the dual viewpoint: Your input is the program running on the Weird machine - Wikipedia defined by P, and you’re trying to explore what you can do with this machine.
It is clear that just throwing random (independently distributed) bytes at your machine won’t explore a lot of interesting behavior: The vast majority leads to uninteresting parse-errors. The way modern fuzzers work is that you don’t just consider the visible output behavior; instead you instrument the code, and consider internal states of the program. Precisely what internal states you take into account is a little bit of an art (that the fuzzer authors engage in); code coverage is almost always part of it. Given “instrumentation”, it is clear why llvm – a compiler library! – is in a good position to help with this.
Let’s take a very simple example:
check(input::Vector{UInt8}) = input[1]==0x68 && input[2] ==0x65 && input[3]==0x6c && input[4]==0x6c && input[5]==0x6c
Let N=5
be the number of comparisons. If you want to discover an input where check
returns true, you would need 2^(8*N)
many random attempts. This is clearly infeasible.
Now you take into account internal program state, namely code coverage. Imagine the following (super naive!) fuzzer: You keep a library of (program output, set of executed instructions in the program) → Set of samples that produced the pair.
In order to keep memory consumption acceptable, you use e.g. reservoir sampling on the sample sets.
At each step, you choose a uniformly random (output, coverage) pair, and then a random sample that produced it. Then, you mutate a single random byte in this input, and record the result.
How long until you find an input that passes the check, when searching length-L inputs? Suppose you already passed a length N check. Then for the next one, you need N * L * 256
many steps (256 because your mutation needs to get the correct character, L because you need to select the right character for mutation, and N because you need to select a sample that passes N checks). Sum this up over N, and you need subexponential time to discover the magic sample that passes the checks.
It is hard to overstate what a big deal this is: Subexponential! AFL famously managed to invent valid jpeg images when fuzzing a jpeg parser in less than a day, without any initial corpus cf https://web.archive.org/web/20190325000013/https://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-of-thin-air.html
You would never observe monkeys on a typewriter creating valid jpeg or SQL or julia. But if the fuzzing target is not a black box, if it is instrumented, then you don’t need proper understanding of your target program.
So, now you’re fuzzing the julia parser. Super cool. However, the julia parser is written in femtolisp; and the femtolisp interpreter is written in C. The problem is that the crucial program trace data doesn’t know about abstractions like “lisp”. Libfuzzer and AFL are made for fuzzing C programs (or more generally: For compiled programs, not for interpreted programs).
And without good program traces, fancy fuzzers are no better than monkeys hammering on a typewriter.
What I’m saying here is: Look at the test cases that the fuzzer stores. If these look like interesting julia, then whatever traces come out are probably good enough to guide the exploration of state space. Otherwise, you may need to add extra instrumentation to the lisp code (to the parser code written in femtolisp – libfuzzer can probably instrument the femtolisp interpreter, but that is primarily useful for fuzzing the femtolisp parser/interpreter, and that’s not your goal).
Simpler said: Unless you do something with the femtolisp interpreter, the fuzzer will use coverage info of the “femtolisp interpreter”, instead of the interpreted program. You care about the interpreted program, so you probably need a way to inform your fuzzer about which parts of the interpreted program (i.e. the julia parser) are executed.