How to make a function store data to avoid repeating computation

JackStrauss · October 15, 2022, 6:16am

I fully agree with this argument, and yes, of course have to use only one thread to write to a global. can read from it though.

In this specific case, we cannot use multithreading because we poll instruments and databases and write to dashboards and send sms, and that can only be done in one place at a time, and its pretty fast anyways thanks to julia, compared to our old python. (and so much nicer to write, thanks julia developers.)

more philosophically, a function in a module provides service to another function, and data does the same. both are local to a module, and both service the function that call it, just like a call to an instrument or a database. I see nothing wrong seeing them all just objects providing service.

The performance is another issue. In this case it does not impact much, but we are alive to it in other code.

Thanks for all the discussion and especially the wonder Julia is.

js

bertschi · October 15, 2022, 8:45am

I see were you’re coming from, but let me add another philosophical view: Mathematically functions do not provide services,e.g., to other functions, but compute results from their inputs, i.e., transform data.

Your use case appears to be an instance of the singleton pattern in OOP. In Julia, I would probably use something the following setup:

function bigcalc(data)  # or bigcalc! if you modify data
   # do something with data
  return data  # also if modified
end

servicecall!(service_data, service_fun, args...) = service_data = service_fun(service_data, args...)

Now, service_data could be a global const global_data = initial_data or created as a singleton data instance, i.e., created via get_service_data() which lazily creates it on the first call and then returns the same instance from an internal cache, i.e., exactly the pattern that you had asked for in this thread.

In any case, viewing the service functions as data transformations has at least the following advantages:

Service functions can be tested independently of each other
If service functions do not modify their data in place, it is obvious that only a servicecall! can update the service data.
If service functions are pure, i.e., free of side-effects (including modifying their input data), servicecall! could add additional semantics for ensuring thread-safe access – similar to Clojure atoms

JackStrauss · October 15, 2022, 10:22am

Yes, this what happens when one philosophises about other fields. One gets it wrong. Mea Culpa.

I will look into the servicecall, looks like an elegant solutions thanks.

But I was contemplating thread safety of passing struct vs. global, and don’t see them as all that different. suppose

function(struct)
... heavy multithreaded calculation with struct that modifies struct
return struct

vs.

 function()
... heavy multithreaded calculation with global that modifies global
end

aren’t the thread safety issues exactly the same?

again, I should just stick to my field, but I was told when starting: the 2 cardinal sins of coding are goto and global, and I have never seen either as such pure evil. (and a friend who does embedded code in c uses goto all the time) (and my globals in julia work just fine)

and in case anyone thinks I am too critical whilst ill informed. Yes sorry, but it comes from really liking julia. And now at work arguing against moving to rust which has clear advantages and disadvantages viz. julia

bertschi · October 15, 2022, 12:59pm

You are right, if your function modifies the struct passed to it, it won’t be thread-safe. In this case, the main benefit of passing the argument explicitly is that you can test it independently, i.e., without setting up a global context first. Arguably, it is also more readable as all required arguments are around locally – the global context could be defined and modified anywhere.

Multi-threading gets easier if the function is pure, i.e., does not modify the passed in arguments, but creates and returns a new data structure. With modifiable data backed into the language this easily becomes inefficient, e.g., using copy-on-write. Persistent data structures enable data structures which can efficiently be updated without actually modifying any previous version of the data. In this case, servicecall! could simply add some logic around the actual updates to the global data store to ensure that all service functions have seen consistent snapshots of the data, i.e.,

# pseudo code of servicecall!
function servicecall!(service_data, service_function)
     orig_data = service_data
     new_data = service_function(orig_data)
     @atomicif orig_data === service_data  # data did not the updated in the meantime
              update!(service_data, new_data)
     else
              servicecall!(service_data, service_function)  # retry the service_call
     end
end

JackStrauss · October 16, 2022, 7:42am

yes. agree. Thanks for the code and discussion, will try it out.

edd26 · October 17, 2022, 12:22pm

Have a look at produce_or_load function from the scientific project assistant library called DrWatson.jl.

produce_or_load is a function that implements memoization. Minimal working case is when you provide two argumments:

config is “some kind of named parameter container”
f is a function that returns a dictionary with results

As a result, the output of the the function f is saved with:

name, that is generated from the elements of config file
path, that by default is set to "", but can ge changed with keyword argument path="some/path"

As mentioned, the function f have to return a dictionary. If that is not the case for your function, then you could use do-block and use your function within that block

Please see the link for more details on arguments. Also, on the main page of the library they have a video that introduces DrWatson (presented on JuliaCon2020).

Topic		Replies	Views
How to define a function that remembers values? General Usage question	5	1370	October 26, 2017
In Julia, how to create a function that saves its own internal state? New to Julia question , function	10	2722	April 7, 2021
Recommended way to cache results of expensive functions? General Usage performance , design , optimization	6	4570	February 14, 2021
How Can I Create Permanently-Stored Lookup Tables? General Usage question	11	1143	April 22, 2021
On modules and globals General Usage module	14	839	January 8, 2022

How to make a function store data to avoid repeating computation

Related topics