Dispatch on content from file

yakir12 · October 19, 2018, 6:22pm

This is a really open-ended question but I hope to get some wisdom from those that found a solution that you’re content with.

I need to process data in very distinct and different ways dependent on the content of some defining data. So it’s read a special file in a folder, and dependent on the content of that file, process the rest of that folder in a specific way.

What I want to know is if you found some way to do this kind of work, a way you like. I can imagine stuff that heavily uses DataFrames, NamedTuples, Singleton types, etc. but would like to hear about your solution if you have one.

Thanks in advance!

fredrikekre · October 19, 2018, 8:02pm

Seems like a perfect job for a good ol’ if-elseif-end construct;

function process(dir) 
    content =... # read the info in the file you need
    if content == "hello"
        process_hello(dir)
    elseif content == "world"
        process_world(dir)
    elseif... 
    end
end

yakir12 · October 19, 2018, 8:04pm

I hear you, but it grieves me to do that when I could use multiple dispatch instead…

jandehaan · October 19, 2018, 8:16pm

Would ConfParser.jl work for you?

ExpandingMan · October 19, 2018, 8:22pm

Many of us have been grappling with the type stability of IO for years. As far as I know, the best attempt at handling this in a generic way is DataStreams.jl, though, this is implemented specifically for tabular data. In general I would say the best practice is to make sure the result of your IO gets passed into another function quickly, rather than wallowing in a function that doesn’t know what it is. Here’s a rather silly but illustrative example of the pattern that’s needed:

using Serialization

metadata_buff = IOBuffer()
serialize(metadata_buff, Float32)
serialize(metadata_buff, Float64)
seekstart(metadata_buff)

data_buff = IOBuffer()
write(data_buff, Float32(2.0))
write(data_buff, 3.0)
seekstart(data_buff)

g(x::Float32) = x^2
g(x::Float64) = x^3

function f(mdata::IO, data::IO)
    dtype = deserialize(mdata)
    x = read(data, dtype)
    println(x^2)
    g(x)
end

julia> f(metadata_buff, data_buff)
4.0f0

julia> f(metadata_buff, data_buff)
27.0

See the following:

julia> @code_warntype f(metadata_buff, data_buff)
Body::Union{Float32, Float64}
17 1 ── %1  = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Any,1}, svec(Any, Int64), :(:ccall), 2, Array{Any,1}, 32, 32))::Array{Any,1}                         │╻╷╷╷╷╷ deserialize
   │    %2  = %new(IdDict{Any,Any}, %1, 0, 0)::IdDict{Any,Any}                                                                                                       ││┃│││   Type
   │    %3  = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Int64,1}, svec(Any, Int64), :(:ccall), 2, Array{Int64,1}, 0, 0))::Array{Int64,1}                     │││╻╷     Type
   │    %4  = invoke Dict{UInt64,Any}()::Dict{UInt64,Any}                                                                                                            ││││   
   │    %5  = %new(Serializer{Base.GenericIOBuffer{Array{UInt8,1}}}, mdata, 0, %2, %3, %4)::Serializer{Base.GenericIOBuffer{Array{UInt8,1}}}                         ││││   
   │    %6  = (Base.getfield)(%5, :io)::Base.GenericIOBuffer{Array{UInt8,1}}                                                                                         │││╻      getproperty
   │    %7  = (Base.getfield)(%6, :readable)::Bool                                                                                                                   ││││╻      getproperty
   └───       goto #5 if not %7                                                                                                                                      ││││   
   2 ── %9  = (Base.getfield)(%6, :ptr)::Int64                                                                                                                       ││││╻      getproperty
   │    %10 = (Base.getfield)(%6, :size)::Int64                                                                                                                      ││││╻      getproperty
   │    %11 = (Base.slt_int)(%10, %9)::Bool                                                                                                                          ││││╻╷     >
   └───       goto #4 if not %11                                                                                                                                     ││││   
   3 ──       (Base.throw)($(QuoteNode(EOFError())))                                                                                                                 ││││   
   └───       $(Expr(:unreachable))                                                                                                                                  ││││   
   4 ── %15 = (Base.getfield)(%6, :data)::Array{UInt8,1}                                                                                                             ││││╻      getproperty
   │    %16 = (Base.arrayref)(false, %15, %9)::UInt8                                                                                                                 ││││╻      getindex
   │    %17 = (Base.add_int)(%9, 1)::Int64                                                                                                                           ││││╻      +
   │          (Base.setfield!)(%6, :ptr, %17)                                                                                                                        ││││╻      setproperty!
   └───       goto #6                                                                                                                                                │││╻      read
   5 ── %20 = %new(Core.ArgumentError, "read failed, IOBuffer is not readable")::ArgumentError                                                                       ││││╻      Type
   │          (Base.throw)(%20)                                                                                                                                      ││││   
   └───       $(Expr(:unreachable))                                                                                                                                  ││││   
   6 ┄─ %23 = (Core.zext_int)(Core.Int32, %16)::Int32                                                                                                                ││││╻      toInt32
   │    %24 = invoke Serialization.handle_deserialize(%5::Serializer{Base.GenericIOBuffer{Array{UInt8,1}}}, %23::Int32)::Any                                         │││    
   └───       goto #7                                                                                                                                                │││    
   7 ──       goto #8                                                                                                                                                ││     
18 8 ── %27 = (Main.read)(data, %24)::Any                                                                                                                            │      
19 │    %28 = Base.literal_pow::Core.Compiler.Const(Base.literal_pow, false)                                                                                         │      
   │    %29 = Main.:^::Core.Compiler.Const(^, false)                                                                                                                 │      
   │    %30 = (isa)(%27, Irrational{:ℯ})::Bool                                                                                                                       │      
   └───       goto #10 if not %30                                                                                                                                    │      
   9 ── %32 = π (%27, Irrational{:ℯ})                                                                                                                                │      
   │    %33 = invoke %28(%29::typeof(^), %32::Irrational{:ℯ}, $(QuoteNode(Val{2}()))::Val{2})::Any                                                                   │      
   └───       goto #11                                                                                                                                               │      
   10 ─ %35 = (Base.literal_pow)(Main.:^, %27, $(QuoteNode(Val{2}())))::Any                                                                                          │      
   └───       goto #11                                                                                                                                               │      
   11 ┄ %37 = φ (#9 => %33, #10 => %35)::Any                                                                                                                         │      
   │          (Main.println)(%37)                                                                                                                                    │      
20 │    %39 = (isa)(%27, Float64)::Bool                                                                                                                              │      
   └───       goto #13 if not %39                                                                                                                                    │      
   12 ─ %41 = π (%27, Float64)                                                                                                                                       │      
   │    %42 = (Base.mul_float)(%41, %41)::Float64                                                                                                                    │╻╷╷    g
   │    %43 = (Base.mul_float)(%42, %41)::Float64                                                                                                                    ││┃││    literal_pow
   └───       goto #16                                                                                                                                               │      
   13 ─ %45 = (isa)(%27, Float32)::Bool                                                                                                                              │      
   └───       goto #15 if not %45                                                                                                                                    │      
   14 ─ %47 = π (%27, Float32)                                                                                                                                       │      
   │    %48 = (Base.mul_float)(%47, %47)::Float32                                                                                                                    ││╻╷     literal_pow
   └───       goto #16                                                                                                                                               │      
   15 ─ %50 = (Main.g)(%27)::Union{Float32, Float64}                                                                                                                 │      
   └───       goto #16                                                                                                                                               │      
   16 ┄ %52 = φ (#12 => %43, #14 => %48, #15 => %50)::Union{Float32, Float64}                                                                                        │      
   └───       return %52

This is a little harder to parse without the nice highlighting of the macro, so I invite you to read it in your terminal. The gist here is that the compiler has absolutely no idea what type x is when you call println(x^2) but it does know what the output of g is either a Float32 or a Float64. (There is a real difference here! In the former case the compiler has no idea what ^ it’ll be calling when it compiles f while in the latter case it knows it’ll be on a float when it compiles g!) The rule of thumb is that you should always have dedicated parsing of metadata and then use that to pass to functions where types can be known. In this example you would not want to have lots of stuff happening within f, you want it all in g.

So getting back to your original question (which my not have been primarily about type stability?) essentially yes: all Julia code is multiple dispatch so you are of course dispatching on the contents of your files once you start doing anything with it but the more you resolve ambiguities with metadata the happier you’ll be.

In my usual workflow (which has only recently started finally solidifying to the point where I really feel like I know what I’m doing) I dispatch on types which I use to “tag” different pieces of data, for example tables, for example

abstract type TableTag end

struct TableA <: TableTag end
struct TableB <: TableTag end

(in practice these structs usually hold some sort of metadata as well). My functions which accept dataframes actually have signatures like

f(tag::TableA, df::AbstractDataFrame) = # do stuff to table A
f(tag::TableB, df::AbstractDataFrame) = # do stuff to table B

At some point I will have a standard way of serializing the tags so that I can store the metadata of my complete dataset. For the time being what I do is that the full path and file name of each table depends on its tag, so I have something like loadraw(tag::TableA). At some point I also hope to simply have the dataframes wrapped in the structs rather than having separate tags, but for the time being I lack an appropriate AbstractTable type to inherit from.

Anyway, I feel like this turned into a long ramble that may not have much to do with your original question. Having IO appropriate for my workflow is something I’ve been working towards for a long time, and has touched many projects such as my re-write of Feather.jl. I’m planning on finally creating a generic package that can serve as a template for my workflow soon (the goal is to cleanly separate all of the data cleaning nonsense from the underlying abstract mathematical problem) so if you’re interested stay tuned.

Tamas_Papp · October 20, 2018, 6:26am

Hard to say more without the details, but you could consider

creating some composite types that govern processing (via a combination of types and fields),
parse the “special file”, and instantiate one of these types,
from then on, use generic functions that dispatch on these types.

yakir12 · October 20, 2018, 8:04am

Looks cool, I think I’ve been using Mustache.jl for similar stuff.

@ExpandingMan, @Tamas_Papp I was hoping you two would answer

You’re both right on. It’s good to hear that you deal with similar stuff and come to a conclusion I can understand.

I’m gonna take this in and try it out on my data and see how I fare. Please keep me updated if you do push that package out. Sounds promising.

Topic		Replies	Views
(Not) Giving up on dispatch General Usage multidispatch	23	1875	August 30, 2021
Multiple dispatch "default case"? New to Julia	8	505	September 1, 2023
IOStream and Filename Use Internals & Design	18	1760	March 17, 2018
Multiple Dispatch on input file type? General Usage question	0	572	April 11, 2017
Julia best practices when it comes to file handling and string/symbol usage. Please provide a code review of the following: General Usage question , metaprogramming	10	1184	March 10, 2019

Dispatch on content from file

Related topics