Very slow array creation (and Benchmark iterate error)

I am a beginner Julia user and am having a hard time developing a model because it takes a very long time to initialize certain arrays. I tried using BenchmarkTools.jl to get an accurate time but it generated an iterate error that I cannot replicate. Anyway, I think the time with a smaller sample size (NumTotalSpecies) used for testing takes too long. Could I have some suggestions for my code?

time_FoodWeb = @time create_FoodWeb(BM, Ropt,γ)
82.159197 seconds (155.09 M allocations: 76.550 GiB, 12.35% gc time)
161.635944 seconds (383.93 M allocations: 195.124 GiB, 12.91% gc time)
51.302236 seconds (125.56 M allocations: 63.508 GiB, 14.34% gc time)
using Distributions, Random

const D = 0.25
const Ropt = 100
const γ= 2
const q = rand(Normal(0.5,0.2)) #Hill exponent, μ = 0.5, σ = 0.2

#const = #was a constant but removed to allow testing

NumPlantSpecies = 8 #const
NumAnimalSpecies = 12 #const
NumTotalSpecies = NumPlantSpecies + NumAnimalSpecies #const

BM_P = sort(rand(Uniform(10^0, 10^6), NumPlantSpecies)) #body mass of Plant species #const
BM_A = sort(rand(Uniform(10^2, 10^12), NumAnimalSpecies)) #body mass of Animal species #const
BM = vcat(BM_P,BM_A) #placing BM_P and BM_A together under BM as mass of all species #const

function create_FoodWeb(BM,Ropt,γ)
    function create_L(BM,Ropt,γ)
        L = @. ((BM/(BM'*Ropt))' * exp(1-(BM/(BM'*Ropt))'))^γ
        # Weak links (Lij <p) are removed
        L[:,1:NumPlantSpecies] .= 0.0
        L[L .<= 0.01] .= 0.0
        return(L)
    end

    condition = true
    while condition
        BM_P = sort(rand(Uniform(10^0, 10^6), NumPlantSpecies))
        BM_A = sort(rand(Uniform(10^2, 10^12), NumAnimalSpecies))
        BM = vcat(BM_P,BM_A)

        global L = create_L(BM,Ropt,γ)
        #take the link probabilities of the previous matrix and determine if L[i,j] > p
        global FoodWeb = zero(L)
        FoodWeb[:,1:NumPlantSpecies] .= 0.0 # keeps plants as basal species
        FoodWeb[L .> rand.(Uniform(0, 1))] .= 1.0 # Sets F[i,j] = 0  wherever  L[i,j] < F[i,j]
        
        condition = any(sum(x->x>0, FoodWeb, dims=1)  + sum(x->x>0, FoodWeb, dims=2)' .== 0.0)
        #dims=(1) for column, dims=(2) for row
    end
    return(BM, L, FoodWeb)
end

Here I wanted to use permutedims instead of ' because BM is an array but this leads to an error.

        L = @. ((BM/(BM'*Ropt))' * exp(1-(BM/(BM'*Ropt))'))^γ
ERROR: MethodError: no method matching permutedims(::Float64)

The first issue I notice is that you have a lot of non-constant global variables, and you’re creating even more non-constant globals inside your while loop. The very first Julia performance tip is “avoid global varaibles”, and we’re not kidding about that–it can have a huge impact on the performance of your code.

The first thing I would do is make all of your global parameters const and remove the global L and global FoodWeb annotations. The next thing I would do is pre-allocate those matrices outside of the loop and update them in-place, rather than constructing new ones every time.

4 Likes

(If you want to pass non-const data into a function, use function arguments. If you want to pass data out of a function, use the return value or, occasionally, mutable arguments.)

3 Likes

In fact, avoiding global variables isn’t just a Julia performance tip. In pretty much any programming language global variables are considered bad, and they should be avoided in general.

Just google “why are global variables bad”.

Could you give more details on the @btime error? I can’t reproduce using the code you posted above and running @btime create_FoodWeb(BM, Ropt,γ)

That said the suggestions given above help quite a bit with performance, the following slightly simplified version gives benchmark results between 1 and 15 seconds (trials are quite variable maybe due to the randomness in the while condition?)

I’ve added some comments as I’m not 100% sure my simplifications haven’t accidentally changed your algorithm, so do double check!

using Distributions, Random, BenchmarkTools

# note BM is not required as an input to this function
# Julia convention is to only capitalise types, not variable names
function create_foodWeb(Ropt, γ, no_plant, no_animal)

  function create_L(bm, Ropt, γ, no_plant)
      x = (bm ./ (bm' .* Ropt))'
      L = @. ( x * exp(1.0 - x) )^γ
      L[:, 1:no_plant] .= 0.0
      L[L .<= 0.01] .= 0.0
      return L # return is a keyword rather than a function
  end
  
  # preallocating arrays 
  foodWeb = zeros(no_plant + no_animal, no_plant + no_animal)
  L = zeros(size(foodWeb)...)
  bm = zeros(no_plant + no_animal) 

  condition = true
  while condition
      # assigning directly rather than creating new BM_A/BM_P arrays and vcating
      bm[1:no_plant] = sort(rand(Uniform(10^0, 10^6), no_plant))
      bm[no_plant+1:end] = sort(rand(Uniform(10^2, 10^12), no_animal))

      L = create_L(bm, Ropt, γ, no_plant)
      foodWeb .= 0.0 # setting existing array to zero
      foodWeb[L .> rand()] .= 1.0 # note rand() already samples from Uniform(0,1)
      # removed the x->x>0 as entries in foodWeb are either 0 or 1 anyway
      condition = any(sum(foodWeb, dims=1) + sum(foodWeb, dims=2)' .== 0.0)
  end

  return bm, L, foodWeb
end

Ropt = 100
γ = 2
no_plant = 8 
no_animal = 12

@benchmark create_foodWeb(Ropt, γ, no_plant, no_animal)

Thank you for your replies. I have adopted the format @nilshg suggested and have been pre-allocating arrays - your example was very helpful to me compared to the manual.

@rdeits @stevengj I would like to make L and FoodWeb, once generated, become global constants, as they are used as parameters for functions further on. Is there a way to do this? return const bm, L, foodWeb does not work.

@nilshg Regarding the @benchmark error, I cannot recreate it either. Maybe it was just being strained by my very slow code.

I have a few structural suggestions for you. You probably don’t want L and FoodWeb as global constants, it might be better to pass the data around so your code is flexible. Using globals limits future modularity, and makes your code harder to think about in isolation as you keep accessing state from outside of the current scope.

You should make a struct that contains the arrays, instead of passing around separate arrays.

struct FoodWeb{D,L,B}
    foodweb::D
    l::L
    bm::B
end

But use better names for the last two fields :). Then all your data is lumped, and you can easily pass the struct around your program, and add new fields if you need them, of write a slightly different formulation and use multiple dispatch to trigger different methods.

Make create_Foodweb a constructor FoodWeb() that does the same things it currently does, but actually builds the struct as the last step and returns it.

1 Like