I have a lot of Julia code that goes something like this:
const data = DataFrame!(CSV.File("path/to/some/file.csv"))
function foo(data)
for row in eachrow(data)
# do some stuff
end
return # some interesting result
end
function bar(data)
a = [findfirst(x -> x == value, data.column_C), :column_A]
# ... #
return # some interesting result
end
Basically, I have some data source that I’m working with that I assign to a const variable in the global scope and then I generally have numerous functions that manipulate that data in some way and return results.
Is this the best way to do this kind of work or is there a design pattern that makes more sense? The data files aren’t always CSV files and are sometimes quite large so I typically only want to load them once.
It’s hard to say without knowing what you need these functions to do, and how much flexibility you need in using them. But my first guess is that your foo and bar functions could take data as input argument (of DataFrame type), potentially together with other options on how to operate on it (like column names and such).
An immediate advantage of this is: if you need to operate on two dataframes you can just invoke foo on each of them (same for bar).
I should have clarified - what you are describing is often what I do as many times I have more than one data source (edited the OP to reflect this). What I’m really asking about is assigning the data to a variable in the global scope…
That’s how I usually do it, I generally have data_transform(data; options...) or data_transform!(data; options) when whatever I’m doing changes the raw data in the process (e.g. adding columns).