Best performance using dictionaries, functions and modules

Hi everyone,

I’m currently working on a program and need some help for better performance. As I switched from MATLAB to Julia, I don’t know if I am writting in a good way. I tried to write it similarly to MATLAB’s fashion, but it seems to take longer and I don’t know if that is due to some Atom / Julia Pro (yes, I don’t know why Atom with Juno seems to work way slower than the standard Julia REPL interface) issue as I run my code or if this is matter of poor writing. In general, the code is as follows:

1 - example.jl declares variables (global in Main), which are Dicts{String,Any} because there are Int64 and Float64 arrays associated to the keys;

2 - functions.jl contain a module with several functions inside it using Base.include() to add each function. I also call some packages from within the module (Don’t know if that’s the best practice);

3 - Program_Drive.jl that loads example.jl variables (Dicts) into Main, import the module functions and run a specific function from within that module. As the user may specify which function need to be loaded, the function getfiled() is used.

Here are some main important files in the code:


df = Dict{String,Any}(some Float64 arrays, some Int64 arrays);
elem = Dict{String,Any}(some Float64 arrays, some Int64 arrays);
inf = Dict{String,Any}(some Float64 arrays, some Int64 arrays);


module functions

using LinearAlgebra, Plots , Distributions, XLSX






import .functions;

include("example.jl");  # loads the variables into Main

method = "function1";
u = Array{Float64,1};
cr = "function3";

U = getfield(Main.functions,Symbol(method))(df,elem,inf,u,cr,); # passes global variables from # Main as arguments to the function1.

I know avoiding global variables is a good practice, but I only use it to load the information from example.jl which is later used as arguments for the functions. Writing codes in MATLAB is all about functions and scripts mostly, and quite often declaring type of variables won’t affect too much this kind of code I’m doing. In addition, availability of packages/functions inside modules is a little confusing me to sometimes.

I would really appreciate if anybody could help me out with this issue. I’m definitely not deep into Julia, but I want to be. Thank you all in advance.

A couple of things I notice right off the bat, none of which have anything to do with performance (I think):

  1. You don’t need all those semicolons. In the REPL, semicolons suppress output, but they’re behaving exactly the same as a newline in scripts
  2. You seem to be including the same file 5 times in functions.jl. Is that meant to be 5 different functions?
  3. With u = Array{Float64,1}, were you intending to make the array, or just refer to the type? You’re doing the later here - If you actually want an array, do u = Float64[]
  4. I have no idea what you’re doing with U - that looks like a really weird way to call a function. Why not just do U = function1(df,elem,inf,u,cr)?

As for performance with global dicts, it’s not ideal, but before trying to optimize on that, you’re probably better off doing some profiling to determine where your slow downs are. I’ve agonized over optimizing things only to find that they were a tiny fraction of the run time. Honestly, I’m guessing the fact that you’re using Plots is the primary source of your problem (try searching this forum for “time to first plot”).

That said, if your Dicts aren’t too big, you might consider NamedTuples, so

d = Dict("a"=>42, "b"=>3.14)
# becomes
nt = (a=42, b=3.14)
nt.a # or nt[:a]

Using const nt = (a=42, b=3.14) could also help with the global state if you’re accessing it a bunch I think.


1 - Alright, fixed;
2 - Yes, it was supposed to be 5 different functions;
3 - I intended to build an array, but I missed the arrguments when I typed it here;
4 - I need to choose which function (mathmatical method) needs to be executed using the data from example.jl. Then, I just store the name of the function as a string and call getfield().

As I have a significant amount of physical properties for the elements, many arrays need to be created. Instead of having all those arrays, I grouped them into Dicts. It’s easier to passe it as function arguments. Moreover, I need to manipulate those arrays inside the dictionaries, so I think I cannot use tuples.

Thank you very much for the tips and advices. I’ll put some efforts on speeding up the code.

Don’t use a Dict for this. Create a Properties struct which contains the correct fields. This will be at least 2x faster than using a dictionary, as dictionaries have a significant cost to insert and get elements due to the need to hash the keys. Since structs have static elements, getindex and setindex will be simple pointer operations.

1 Like

Coud you give me an example? For instance group those arrays below into a Properties struct:

A = Array{Float64,2}(something)
B = Array{Int32,1}(something)
C = Int32(something)


mutable struct Property
    A :: Matrix{Float64}
    B :: Array{Int32}
    C :: Int32
x = Property(zeros(Float64,5,5), zeros(Int32,5), 2)

You can then access for example x.A to get the matrix etc.

Perfect. Inside my functions, I need to assign new values to the arrays from my struct. Shoud I create a mutable struct for that? Or could I use the B[:] notation, since Arrays are always multable even inside structs. Which one is more preferable for performance issues? Thank you.

If you need to update C, then the struct should be mutable. Otherwise, you should just mutate B.

Alright, that changes quite a lot in the code. I will do it now. Thank you.