Better design pattern for this type of development..?

mthelm85 · August 7, 2020, 1:03pm

I have a lot of Julia code that goes something like this:

const data = DataFrame!(CSV.File("path/to/some/file.csv"))

function foo(data)
    for row in eachrow(data)
        # do some stuff 
    end
    return # some interesting result
end

function bar(data)
    a = [findfirst(x -> x == value, data.column_C), :column_A]
    # ... #
    return # some interesting result
end

Basically, I have some data source that I’m working with that I assign to a const variable in the global scope and then I generally have numerous functions that manipulate that data in some way and return results.

Is this the best way to do this kind of work or is there a design pattern that makes more sense? The data files aren’t always CSV files and are sometimes quite large so I typically only want to load them once.

lostella · August 7, 2020, 1:11pm

It’s hard to say without knowing what you need these functions to do, and how much flexibility you need in using them. But my first guess is that your foo and bar functions could take data as input argument (of DataFrame type), potentially together with other options on how to operate on it (like column names and such).

An immediate advantage of this is: if you need to operate on two dataframes you can just invoke foo on each of them (same for bar).

mthelm85 · August 7, 2020, 1:13pm

I should have clarified - what you are describing is often what I do as many times I have more than one data source (edited the OP to reflect this). What I’m really asking about is assigning the data to a variable in the global scope…

nilshg · August 7, 2020, 1:13pm

That’s how I usually do it, I generally have data_transform(data; options...) or data_transform!(data; options) when whatever I’m doing changes the raw data in the process (e.g. adding columns).

mthelm85 · August 7, 2020, 1:15pm

I’m in good company then - this makes me feel better about it

pdeffebach · August 7, 2020, 1:18pm

I think this seems good, but I think its important to be consistent between modifying functions ! and copying functions.

Do you want to do Stata or R, basically. I think both have merits.

Topic		Replies	Views
Avoiding global variables while using DataFrames General Usage question , dataframes	3	1614	October 25, 2021
Lazily fetch/load data into a DataFrame General Usage	4	300	November 17, 2023
Rewriting dplyr code which uses a function of columns in Julia -style using DataFrames.jl General Usage dataframes	5	610	March 25, 2021
Global variables / performance / data passing New to Julia	21	3298	January 2, 2019
Frustrated using DataFrames New to Julia dataframes , data_structures	97	10579	April 22, 2022

Better design pattern for this type of development..?

Related topics