How to make a function store data to avoid repeating computation

I fully agree with this argument, and yes, of course have to use only one thread to write to a global. can read from it though.

In this specific case, we cannot use multithreading because we poll instruments and databases and write to dashboards and send sms, and that can only be done in one place at a time, and its pretty fast anyways thanks to julia, compared to our old python. (and so much nicer to write, thanks julia developers.)

more philosophically, a function in a module provides service to another function, and data does the same. both are local to a module, and both service the function that call it, just like a call to an instrument or a database. I see nothing wrong seeing them all just objects providing service.

The performance is another issue. In this case it does not impact much, but we are alive to it in other code.

Thanks for all the discussion and especially the wonder Julia is.

js

I see were you’re coming from, but let me add another philosophical view: Mathematically functions do not provide services,e.g., to other functions, but compute results from their inputs, i.e., transform data.

Your use case appears to be an instance of the singleton pattern in OOP. In Julia, I would probably use something the following setup:

function bigcalc(data)  # or bigcalc! if you modify data
   # do something with data
  return data  # also if modified
end

servicecall!(service_data, service_fun, args...) = service_data = service_fun(service_data, args...)

Now, service_data could be a global const global_data = initial_data or created as a singleton data instance, i.e., created via get_service_data() which lazily creates it on the first call and then returns the same instance from an internal cache, i.e., exactly the pattern that you had asked for in this thread.

In any case, viewing the service functions as data transformations has at least the following advantages:

  1. Service functions can be tested independently of each other
  2. If service functions do not modify their data in place, it is obvious that only a servicecall! can update the service data.
  3. If service functions are pure, i.e., free of side-effects (including modifying their input data), servicecall! could add additional semantics for ensuring thread-safe access – similar to Clojure atoms

Yes, this what happens when one philosophises about other fields. One gets it wrong. Mea Culpa.

I will look into the servicecall, looks like an elegant solutions thanks.

But I was contemplating thread safety of passing struct vs. global, and don’t see them as all that different. suppose

function(struct)
... heavy multithreaded calculation with struct that modifies struct
return struct

vs.

 function()
... heavy multithreaded calculation with global that modifies global
end

aren’t the thread safety issues exactly the same?

again, I should just stick to my field, but I was told when starting: the 2 cardinal sins of coding are goto and global, and I have never seen either as such pure evil. (and a friend who does embedded code in c uses goto all the time) (and my globals in julia work just fine)

and in case anyone thinks I am too critical whilst ill informed. Yes sorry, but it comes from really liking julia. And now at work arguing against moving to rust which has clear advantages and disadvantages viz. julia

You are right, if your function modifies the struct passed to it, it won’t be thread-safe. In this case, the main benefit of passing the argument explicitly is that you can test it independently, i.e., without setting up a global context first. Arguably, it is also more readable as all required arguments are around locally – the global context could be defined and modified anywhere.

Multi-threading gets easier if the function is pure, i.e., does not modify the passed in arguments, but creates and returns a new data structure. With modifiable data backed into the language this easily becomes inefficient, e.g., using copy-on-write. Persistent data structures enable data structures which can efficiently be updated without actually modifying any previous version of the data. In this case, servicecall! could simply add some logic around the actual updates to the global data store to ensure that all service functions have seen consistent snapshots of the data, i.e.,

# pseudo code of servicecall!
function servicecall!(service_data, service_function)
     orig_data = service_data
     new_data = service_function(orig_data)
     @atomicif orig_data === service_data  # data did not the updated in the meantime
              update!(service_data, new_data)
     else
              servicecall!(service_data, service_function)  # retry the service_call
     end
end
1 Like

yes. agree. Thanks for the code and discussion, will try it out.

Have a look at produce_or_load function from the scientific project assistant library called DrWatson.jl.

produce_or_load is a function that implements memoization. Minimal working case is when you provide two argumments:

  • config is “some kind of named parameter container”
  • f is a function that returns a dictionary with results

As a result, the output of the the function f is saved with:

  • name, that is generated from the elements of config file
  • path, that by default is set to "", but can ge changed with keyword argument path="some/path"

As mentioned, the function f have to return a dictionary. If that is not the case for your function, then you could use do-block and use your function within that block

Please see the link for more details on arguments. Also, on the main page of the library they have a video that introduces DrWatson (presented on JuliaCon2020).