Better design pattern for this type of development..?

I have a lot of Julia code that goes something like this:

const data = DataFrame!(CSV.File("path/to/some/file.csv"))

function foo(data)
    for row in eachrow(data)
        # do some stuff 
    end
    return # some interesting result
end

function bar(data)
    a = [findfirst(x -> x == value, data.column_C), :column_A]
    # ... #
    return # some interesting result
end

Basically, I have some data source that I’m working with that I assign to a const variable in the global scope and then I generally have numerous functions that manipulate that data in some way and return results.

Is this the best way to do this kind of work or is there a design pattern that makes more sense? The data files aren’t always CSV files and are sometimes quite large so I typically only want to load them once.

It’s hard to say without knowing what you need these functions to do, and how much flexibility you need in using them. But my first guess is that your foo and bar functions could take data as input argument (of DataFrame type), potentially together with other options on how to operate on it (like column names and such).

An immediate advantage of this is: if you need to operate on two dataframes you can just invoke foo on each of them (same for bar).

3 Likes

I should have clarified - what you are describing is often what I do as many times I have more than one data source (edited the OP to reflect this). What I’m really asking about is assigning the data to a variable in the global scope…

That’s how I usually do it, I generally have data_transform(data; options...) or data_transform!(data; options) when whatever I’m doing changes the raw data in the process (e.g. adding columns).

1 Like

I’m in good company then - this makes me feel better about it :slightly_smiling_face:

I think this seems good, but I think its important to be consistent between modifying functions ! and copying functions.

Do you want to do Stata or R, basically. I think both have merits.

2 Likes