Dispatch on content from file

This is a really open-ended question but I hope to get some wisdom from those that found a solution that you’re content with.

I need to process data in very distinct and different ways dependent on the content of some defining data. So it’s read a special file in a folder, and dependent on the content of that file, process the rest of that folder in a specific way.

What I want to know is if you found some way to do this kind of work, a way you like. I can imagine stuff that heavily uses DataFrames, NamedTuples, Singleton types, etc. but would like to hear about your solution if you have one.

Thanks in advance!

Seems like a perfect job for a good ol’ if-elseif-end construct;

function process(dir) 
    content =... # read the info in the file you need
    if content == "hello"
        process_hello(dir)
    elseif content == "world"
        process_world(dir)
    elseif... 
    end
end
1 Like

I hear you, but it grieves me to do that when I could use multiple dispatch instead… :grimacing:

Would ConfParser.jl work for you?

Many of us have been grappling with the type stability of IO for years. As far as I know, the best attempt at handling this in a generic way is DataStreams.jl, though, this is implemented specifically for tabular data. In general I would say the best practice is to make sure the result of your IO gets passed into another function quickly, rather than wallowing in a function that doesn’t know what it is. Here’s a rather silly but illustrative example of the pattern that’s needed:

using Serialization

metadata_buff = IOBuffer()
serialize(metadata_buff, Float32)
serialize(metadata_buff, Float64)
seekstart(metadata_buff)

data_buff = IOBuffer()
write(data_buff, Float32(2.0))
write(data_buff, 3.0)
seekstart(data_buff)

g(x::Float32) = x^2
g(x::Float64) = x^3

function f(mdata::IO, data::IO)
    dtype = deserialize(mdata)
    x = read(data, dtype)
    println(x^2)
    g(x)
end

julia> f(metadata_buff, data_buff)
4.0f0

julia> f(metadata_buff, data_buff)
27.0

See the following:

julia> @code_warntype f(metadata_buff, data_buff)
Body::Union{Float32, Float64}
17 1 ── %1  = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Any,1}, svec(Any, Int64), :(:ccall), 2, Array{Any,1}, 32, 32))::Array{Any,1}                         │╻╷╷╷╷╷ deserialize
   │    %2  = %new(IdDict{Any,Any}, %1, 0, 0)::IdDict{Any,Any}                                                                                                       ││┃│││   Type
   │    %3  = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Int64,1}, svec(Any, Int64), :(:ccall), 2, Array{Int64,1}, 0, 0))::Array{Int64,1}                     │││╻╷     Type
   │    %4  = invoke Dict{UInt64,Any}()::Dict{UInt64,Any}                                                                                                            ││││   
   │    %5  = %new(Serializer{Base.GenericIOBuffer{Array{UInt8,1}}}, mdata, 0, %2, %3, %4)::Serializer{Base.GenericIOBuffer{Array{UInt8,1}}}                         ││││   
   │    %6  = (Base.getfield)(%5, :io)::Base.GenericIOBuffer{Array{UInt8,1}}                                                                                         │││╻      getproperty
   │    %7  = (Base.getfield)(%6, :readable)::Bool                                                                                                                   ││││╻      getproperty
   └───       goto #5 if not %7                                                                                                                                      ││││   
   2 ── %9  = (Base.getfield)(%6, :ptr)::Int64                                                                                                                       ││││╻      getproperty
   │    %10 = (Base.getfield)(%6, :size)::Int64                                                                                                                      ││││╻      getproperty
   │    %11 = (Base.slt_int)(%10, %9)::Bool                                                                                                                          ││││╻╷     >
   └───       goto #4 if not %11                                                                                                                                     ││││   
   3 ──       (Base.throw)($(QuoteNode(EOFError())))                                                                                                                 ││││   
   └───       $(Expr(:unreachable))                                                                                                                                  ││││   
   4 ── %15 = (Base.getfield)(%6, :data)::Array{UInt8,1}                                                                                                             ││││╻      getproperty
   │    %16 = (Base.arrayref)(false, %15, %9)::UInt8                                                                                                                 ││││╻      getindex
   │    %17 = (Base.add_int)(%9, 1)::Int64                                                                                                                           ││││╻      +
   │          (Base.setfield!)(%6, :ptr, %17)                                                                                                                        ││││╻      setproperty!
   └───       goto #6                                                                                                                                                │││╻      read
   5 ── %20 = %new(Core.ArgumentError, "read failed, IOBuffer is not readable")::ArgumentError                                                                       ││││╻      Type
   │          (Base.throw)(%20)                                                                                                                                      ││││   
   └───       $(Expr(:unreachable))                                                                                                                                  ││││   
   6 ┄─ %23 = (Core.zext_int)(Core.Int32, %16)::Int32                                                                                                                ││││╻      toInt32
   │    %24 = invoke Serialization.handle_deserialize(%5::Serializer{Base.GenericIOBuffer{Array{UInt8,1}}}, %23::Int32)::Any                                         │││    
   └───       goto #7                                                                                                                                                │││    
   7 ──       goto #8                                                                                                                                                ││     
18 8 ── %27 = (Main.read)(data, %24)::Any                                                                                                                            │      
19 │    %28 = Base.literal_pow::Core.Compiler.Const(Base.literal_pow, false)                                                                                         │      
   │    %29 = Main.:^::Core.Compiler.Const(^, false)                                                                                                                 │      
   │    %30 = (isa)(%27, Irrational{:ℯ})::Bool                                                                                                                       │      
   └───       goto #10 if not %30                                                                                                                                    │      
   9 ── %32 = π (%27, Irrational{:ℯ})                                                                                                                                │      
   │    %33 = invoke %28(%29::typeof(^), %32::Irrational{:ℯ}, $(QuoteNode(Val{2}()))::Val{2})::Any                                                                   │      
   └───       goto #11                                                                                                                                               │      
   10 ─ %35 = (Base.literal_pow)(Main.:^, %27, $(QuoteNode(Val{2}())))::Any                                                                                          │      
   └───       goto #11                                                                                                                                               │      
   11 ┄ %37 = φ (#9 => %33, #10 => %35)::Any                                                                                                                         │      
   │          (Main.println)(%37)                                                                                                                                    │      
20 │    %39 = (isa)(%27, Float64)::Bool                                                                                                                              │      
   └───       goto #13 if not %39                                                                                                                                    │      
   12 ─ %41 = π (%27, Float64)                                                                                                                                       │      
   │    %42 = (Base.mul_float)(%41, %41)::Float64                                                                                                                    │╻╷╷    g
   │    %43 = (Base.mul_float)(%42, %41)::Float64                                                                                                                    ││┃││    literal_pow
   └───       goto #16                                                                                                                                               │      
   13 ─ %45 = (isa)(%27, Float32)::Bool                                                                                                                              │      
   └───       goto #15 if not %45                                                                                                                                    │      
   14 ─ %47 = π (%27, Float32)                                                                                                                                       │      
   │    %48 = (Base.mul_float)(%47, %47)::Float32                                                                                                                    ││╻╷     literal_pow
   └───       goto #16                                                                                                                                               │      
   15 ─ %50 = (Main.g)(%27)::Union{Float32, Float64}                                                                                                                 │      
   └───       goto #16                                                                                                                                               │      
   16 ┄ %52 = φ (#12 => %43, #14 => %48, #15 => %50)::Union{Float32, Float64}                                                                                        │      
   └───       return %52           

This is a little harder to parse without the nice highlighting of the macro, so I invite you to read it in your terminal. The gist here is that the compiler has absolutely no idea what type x is when you call println(x^2) but it does know what the output of g is either a Float32 or a Float64. (There is a real difference here! In the former case the compiler has no idea what ^ it’ll be calling when it compiles f while in the latter case it knows it’ll be on a float when it compiles g!) The rule of thumb is that you should always have dedicated parsing of metadata and then use that to pass to functions where types can be known. In this example you would not want to have lots of stuff happening within f, you want it all in g.

So getting back to your original question (which my not have been primarily about type stability?) essentially yes: all Julia code is multiple dispatch so you are of course dispatching on the contents of your files once you start doing anything with it but the more you resolve ambiguities with metadata the happier you’ll be.

In my usual workflow (which has only recently started finally solidifying to the point where I really feel like I know what I’m doing) I dispatch on types which I use to “tag” different pieces of data, for example tables, for example

abstract type TableTag end

struct TableA <: TableTag end
struct TableB <: TableTag end

(in practice these structs usually hold some sort of metadata as well). My functions which accept dataframes actually have signatures like

f(tag::TableA, df::AbstractDataFrame) = # do stuff to table A
f(tag::TableB, df::AbstractDataFrame) = # do stuff to table B

At some point I will have a standard way of serializing the tags so that I can store the metadata of my complete dataset. For the time being what I do is that the full path and file name of each table depends on its tag, so I have something like loadraw(tag::TableA). At some point I also hope to simply have the dataframes wrapped in the structs rather than having separate tags, but for the time being I lack an appropriate AbstractTable type to inherit from.

Anyway, I feel like this turned into a long ramble that may not have much to do with your original question. Having IO appropriate for my workflow is something I’ve been working towards for a long time, and has touched many projects such as my re-write of Feather.jl. I’m planning on finally creating a generic package that can serve as a template for my workflow soon (the goal is to cleanly separate all of the data cleaning nonsense from the underlying abstract mathematical problem) so if you’re interested stay tuned.

Hard to say more without the details, but you could consider

  1. creating some composite types that govern processing (via a combination of types and fields),
  2. parse the “special file”, and instantiate one of these types,
  3. from then on, use generic functions that dispatch on these types.

Looks cool, I think I’ve been using Mustache.jl for similar stuff.

@ExpandingMan, @Tamas_Papp I was hoping you two would answer :slight_smile:

You’re both right on. It’s good to hear that you deal with similar stuff and come to a conclusion I can understand.

I’m gonna take this in and try it out on my data and see how I fare. Please keep me updated if you do push that package out. Sounds promising.