Parse, don't validate

The Parse, don’t validate blog post came by on Hacker News again. As far as I understand, this is advertising to use type checking instead of checking the object each time. Are there valuable lessons to be learned in this post in the Julia context? For example, would it be a good idea to use a NonEmpty type?

5 Likes

Parse, don’t validate is a great idea, and Julia code will benefit from doing it. In a typed dynamic language, there is a question of how to implement the idea. E.g.:

  • Double down on types, use things like SumTypes.jl,
  • Double down on dynamism, use things like Clojure’s Spec,
  • Use traits.
4 Likes

I think there’s already a lot that you can do in the spirit of β€œparse, don’t validate” using Julia’s type system. Of course what you end up with when there are bugs is MethodErrors instead of static compilation errors.

One thing that I sometimes forget is that if I’m writing code that processes data that I’ve created myself inside my program, then I don’t need to validate it, since I’m the one who created it and I know what shape it’s in. :slight_smile:

So I think the advice in β€œparse, don’t validate” is focused mostly on processing input data. But some of the advice applies generally, like this one:

Use a data structure that makes illegal states unrepresentable.

Sometimes you see a function like this:

function foo(; flag1, flag2)
    if flag1 && flag2
        1
    elseif flag1 && !flag2
        2
    elseif !flag1 && flag2
        3
    else
        error("flag1 and flag2 cannot both be false")
    end
end        

But you could just make the 4th state illegal by using the type system:

# Hopefully there are more natural names that you can use
# in your real application.
struct Flag1Flag2 end
struct Flag1NotFlag2 end
struct NotFlag1Flag2 end

foo(::Flag1Flag2)    = 1
foo(::Flag1NotFlag2) = 2
foo(::NotFlag1Flag2) = 3
5 Likes

Another example that I think is related. It’s nice to avoid struct fields with Union{Nothing, T} types, if possible. So instead of this,

abstract type AbstractPerson end

struct Person <: AbstractPerson
    name::Union{Nothing, String}
    age::Int
end

you could do this:

abstract type AbstractPerson end

struct Person <: AbstractPerson
    name::String
    age::Int
end

struct Anonymous <: AbstractPerson
    age::Int
end
5 Likes

Parse, don’t validate… with type checking by JET.jl:

File

# parse_dont_validate.jl

struct NonEmpty{T}
    head::T
    tail::Vector{T}
end

head(x::NonEmpty) = x.head

function foo()
    x = rand(3)
    head(x)
end

JET

Scroll to the bottom to see the type error:

julia> report_file("parse_dont_validate.jl"; analyze_from_definitions=true)
[toplevel-info] virtualized the context of Main (took 0.001 sec)
[toplevel-info] entered into parse_dont_validate.jl
[toplevel-info]  exited from parse_dont_validate.jl (took 0.004 sec)
[toplevel-info] analyzing from top-level definitions ... 4/4
═════ 3 possible errors found ═════
β”Œ @ parse_dont_validate.jl:11 rand(3)
β”‚β”Œ @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:277 Random.rand(Random.Float64, Random.Dims(dims))
β”‚β”‚β”Œ @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:289 Random.default_rng()
β”‚β”‚β”‚β”Œ @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Random/src/RNGs.jl:370 Random.default_rng(Base.getproperty(Random.Threads, :threadid)())
β”‚β”‚β”‚β”‚β”Œ @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Random/src/RNGs.jl:376 Random.MersenneTwister()
β”‚β”‚β”‚β”‚β”‚β”Œ @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Random/src/RNGs.jl:147 #self#(Random.nothing)
β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Random/src/RNGs.jl:147 Random.seed!(Random.MersenneTwister(Core.apply_type(Random.Vector, Random.UInt32)(), Random.DSFMT_state()), seed)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Random/src/Random.jl:426 Random.seed!(rng)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Random/src/RNGs.jl:362 Random.make_seed()
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Random/src/RNGs.jl:326 Random.read(Random.pipeline(Base.cmd_gen(Core.tuple(Core.tuple("ifconfig"))), Base.cmd_gen(Core.tuple(Core.tuple("sha1sum")))), Random.String)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ process.jl:421 Base.read(cmd)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ process.jl:410 Base.open(cmd, "r", Base.devnull)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ process.jl:339 Core.kwfunc(Base.open)(Core.apply_type(Core.NamedTuple, (:read, :write))(Core.tuple(true, true)), Base.open, cmds, stdio)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ process.jl:361 Base.#open#646(write, read, _3, cmds, stdio)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ process.jl:365 Base._spawn(cmds, Base.getindex(Base.Any, in, out, Base.stderr))
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ process.jl:119 Base.setup_stdios(#639, stdios)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ process.jl:196 f(open_io)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ process.jl:120 Base._spawn(Core.getfield(#self#, :cmds), stdios, Base.ProcessChain())
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ process.jl:151 Base._spawn(Base.getproperty(cmds, :b), stdios_right, chain)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ process.jl:181 Base._spawn_primitive(Base.getindex(Base.getproperty(cmd, :exec), 1), cmd, stdios)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ process.jl:99 Base.repr(cmd)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ strings/io.jl:219 Base.#repr#386(Base.nothing, #self#, x)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ strings/io.jl:219 Core.kwfunc(Base.sprint)(Core.apply_type(Core.NamedTuple, (:context,))(Core.tuple(context)), Base.sprint, Base.show, x)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ strings/io.jl:101 Base.#sprint#385(Core.tuple(context, sizehint, _3, f), args...)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ strings/io.jl:105 f(Core.tuple(s), args...)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ cmd.jl:116 Base.map(#620, Base.getproperty(cmd, :exec))
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ abstractarray.jl:2294 Base.collect_similar(A, Base.Generator(f, A))
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ array.jl:606 Base._collect(cont, itr, Base.IteratorEltype(itr), Base.IteratorSize(itr))
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ array.jl:691 Base.iterate(itr)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ generator.jl:47 Base.getproperty(g, :f)(Base.getindex(y, 1))
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ cmd.jl:117 Core.kwfunc(Base.sprint)(Core.apply_type(Core.NamedTuple, (:context,))(Core.tuple(Core.getfield(#self#, :io))), Base.sprint, #621)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ strings/io.jl:101 Base.#sprint#385(Core.tuple(context, sizehint, _3, f), args...)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ strings/io.jl:103 f(Core.tuple(Base.IOContext(s, context)), args...)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ cmd.jl:118 Base.with_output_color(#622, :underline, io)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ util.jl:71 Base.#with_output_color#814(Core.tuple(false, #self#, f, color, io), args...)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ util.jl:85 Base.split(str, '\n')
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ strings/util.jl:411 Base.#split#375(0, true, #self#, str, splitter)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ strings/util.jl:411 Base._split(str, Base.isequal(splitter), limit, keepempty, _7)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ strings/util.jl:421 Base.first(r)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ abstractarray.jl:386 Base.iterate(itr)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ no matching method found for call signature (Tuple{typeof(iterate), Nothing}): Base.iterate(itr::Nothing)
│││││││││││││││││││││││││││││││││││││││└────────────────────────
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ strings/util.jl:421 Base.last(r)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ abstractarray.jl:437 Base.lastindex(a)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ no matching method found for call signature (Tuple{typeof(lastindex), Nothing}): Base.lastindex(a::Nothing)
│││││││││││││││││││││││││││││││││││││││└────────────────────────
β”Œ @ parse_dont_validate.jl:12 head(x)
β”‚ no matching method found for call signature (Tuple{typeof(head), Vector{Float64}}): head(x::Vector{Float64})
└─────────────────────────────
(included_files = Set(["parse_dont_validate.jl"]), any_reported = true)
4 Likes

Thank you both for your thoughts and Cameron for your great examples!

I’ve been thinking about it some more and don’t see much benefit (but I am open to be convinced otherwise). In essence, the point of parse don’t validate, as I understand, is to get feedback more quickly. In a perfect situation, syntax highlighting would give an error like in your last example, which is much quicker than compiling, say, Python and seeing the output. However, given that Julia has a quick evaluation going on with Revise, I doubt that the efforts put in properly using types is worth it. But, I might, of course, be completely wrong

That is only part of the benefit.

  • dispatch on very specific properties of objects, good for performance: β€œAlgorithm efficiency comes from problem information”
  • no redundant checks, because problem information is embedded in the type: If I parse an input into a NonEmptyVector{Int64}, then I don’t have to keep checking the empty case whenever I do something with it. I can just dispatch on the type. Good for performance.
  • no redundant checks (less code) good for readability
  • very specific types in method signatures, good for readability
  • if I accidentally change or delete a necessary prerequisite check, the program will error loudly instead of silently assuming that I already checked a property and returning incorrect results
6 Likes

Another example: strings. Package StrBase.jl contains validated string types. That is handy because you don’t need to manually handle invalid byte codes, which also makes it faster in most cases.

5 Likes

Ok, I’m getting more convinced. So, if you would do this for DataFrames, one would do the following? Define types such as NonMissing{DataFrame}, Sorted{DataFrame} and overload all kinds of methods to handle these new types such as

vcat(a::NonMissing{DataFrame}, b::DataFrame) = vcat(DataFrame(a), b)
vcat(a::NonMissing{DataFrame}, b::NonMissing{DataFrame}) = NonMissing{DataFrame}(vcat(a, b))

filter(f, df::NonMissing{DataFrame}) = NonMissing{DataFrame}(filter(f, df::DataFrame))

[...]

That seems like a feasible strategy. That said, it looks like AbstractDataFrame has a pretty large API that’s specific to that type. It might be easier to implement this on the smaller Tables.jl API with SplitApplyCombine.jl, and maybe using one of the other table types.

1 Like

Problem-specific types are often useful in this regard: e.g., not a Table, but a MeasurementList, or Users. Convert from a plain Table immediately after reading from a file, potentially throwing errors in the process, and only use the β€œparsed” value afterwards.

3 Likes