For Monte Carlo simulation with same code same algorithm, how fast is Julia compared with Fortran?

Note that rand(1)[1] allocates a one-element vector and then references its only element, which is a waste of memory and time. Maybe you wanted to use rand() instead?

Regarding the example, you can read The need for rand speed | Blog by Bogumił Kamiński

5 Likes

Thank you very much! What you said are all correct and I learned from your suggestions!

By the way, just a comment, yes I did spend some efforts to get the format correct with help of many great guys :sweat_smile:, because I need to display all K components of mixing model horizontally, otherwise if it is vertical displayed it will be too long and difficult to read. So I did make my Julia display exactly as my Fortran code displays.

1 Like

Oh boy! Crucial suggestion! Thank you so much! :stuck_out_tongue_winking_eye:

I don’t think anyone here has seen your full output yet and since it seems like you’re calculating with uncertainties, here’s a link to a package that may make your life with uncertainty easier:

This package defines a number type with builtin uncertainty/error propagation. It works with standard operations like +, -, , * etc. Might be useful to you.

1 Like

The reason why Julia users who don’t usually use const tell newcomers to “Make global variables const” is to recover performance with minimal changes to the code.

In fact, fixing types with const kills a lot of benefits of using Julia.

For example, if you use const to fix the type to Float64, you will not be able to use Measurements.jl, MonteCarloMeasurements.jl and Intervals.jl.

Not only that, but it also closes the way to introduce automatic differentiation for better performance.

Moreover, fixing the type to Array with const also makes it difficult to migrate to GPU-based computations using CuArray in CUDA.jl for higher performance.

Do not use const to fix types to ensure type stability.

A good and simple way to ensure type stability is to pass all of the information needed by a function (including global variables that contain the information shared with other functions) as arguments to the function each time.

In my personal opinion, higher education around the world teaches the opposite of this point in a harmful way. It keeps producing large numbers of people who are misled by it and don’t want to do the above.

If you actually write the code to pass all the information as arguments to a function every time, you will find that the effort is not significant and the benefits you get, independent of Julia, are huge. Don’t be afraid to try it.

The point is to create constructors for data shared by multiple functions. (cf. Function depending on the global variable inside module - #10 by genkuroki)

14 Likes

c.f.

2 Likes

You might try out the Julia v1.7.0-beta3 - I’m using this version and am pretty happy with it. The default RNG changed and should improve the performance of your RNG-heavy code quite a bit.

Some additional aspect that could improve the performance of your code: I saw that you included some explicit bounds checks. By default, Julia will also check bounds on every indexing operation into arrays, such as musigma[k].mu[1,1] = mu[k,1]. To avoid that, you can use @inbounds (or run Julia with command line argument --check-bounds=no to disable bounds checks globally). As you know, Fortran will not perform bounds checks automatically.

The code that you posted is probably too long such that most people will not want to look at it. If you want to increase the chance of getting help, you could consider positing smaller code snippets that people can review easily on their phone. That’s basically also true for me - it’s too long to motivate me to go through it thoroughly. After performing some profiling, you will easily find the most expensive parts in your code - extract them and post them again and I’m sure people will help you to optimize it.

As others have written in this thread, I would suggest to avoid the use of global variables. Instead, I would pack them in structs or NamedTuples and pass them around in your code. This makes it explicit which variables you will use and allows you to get some library-like behavior instead of a linear program. It will also allow you to exchange types later if that might help (to increase performance or get additional features for free such as automatic differentiation, uncertainty propagation, or simply different real types).

When Trixi.jl was initiated, I wasn’t part of the team (I joined a few months later). At first, it was written mimicking an existing Fortran code (with which we compared the performance in our paper mentioned above). Thus, it was a linear code written to perform a single simulation. To change something, you need to re-compile the Fortran code with different compile time options and/or set some flags in a parameter file which was parsed at first. Then, the code created some global variables, scratchspaces, constants, settings etc.
However, we decided to rewrite Trixi.jl in a completely different style (see also this announcement and our paper linked above). The gist is: Trixi.jl is now designed as a library. This allows us to instantiate multiple different parts of it at the same time, giving us much more flexibility. Additionally, we got rid of parameter files - a simulation setup is defined in pure Julia code. Everything that needs to be set up (such as different internal caches etc.) are created when starting a simulation and passed around. We don’t use any global constants anymore (except in some legacy experimental parts that will be re-designed in the future). Every parameter get’s bundled in some struct that we pass around. For example, the physical equations, the geometric mesh, and the numerical solver are Julia structs bundling information; they are passed around in our code. Moreover, we create a cache as NamedTuple and pass it around in basically every function that needs to access it. This is very convenient for us right now. Additionally, it allowed us to enable automatic differentiation through semidiscretizations or even complete simulations with Trixi.jl (and OrdinaryDiffEq.jl in time).

These macros are quite helpful. One of the reasons why I like Julia as much as I do is exactly that it’s possible to have these sophisticated code manipulation tools. Let me briefly comment o the examples that you mentioned. Feel free to ask whether you want to know more about some of them specifically.

  • @unpack: This is basically a convenience macro from UnPack.jl that makes it easier to unpack things from structs or NamedTuples. When passing those parameters around, you can access things using @unpack a, b, c = foo instead of a = foo.a; b = foo.b; c = foo.c. However, it doesn’t make a performance difference. In Julia v1.7, the same functionality will be available as (; a, b, c) = foo.
  • @inbounds: I mentioned that briefly above. By default, Julia will perform a bounds check before every indexing operation (unlike Fortran). If you want to avoid this and know that you will not have out-of-bounds access, use @inbounds (or disable bounds checking globally). Depending on your code, this can improve the performance significantly, in particular if it enables further optimizations such as SIMD.
  • @inline: The Julia compiler uses some heuristics to decide whether a “small” function should be inlined or not. You can nudge the compiler to do so using @inline. Depending on the use case, this will definitely improve the performance.
12 Likes

It’s also very much worth noting – your code looks like it’s a statistical simulation. You might want check out Turing.jl, and try using their algorithms instead of coding a whole simulation from scratch. You should be able to get non-Bayesian output from it by just setting flat priors on everything, in which case you’ll be drawing samples from the likelihood function instead of the posterior.

3 Likes

If you are concerned about using names that start with ‘@’ on Discourse, then here’s a tip: if you wrap them in single backticks, they will not be interpreted as user handles, but as code, and you can use them freely: @inline, @inbounds, @unpack. No need to put spaces next to the ‘@’.

For example, I can write @CRquantum, and you will not be pinged because of that.

3 Likes

Just in case there’s any confusion, this is the single back tick character:

`

You can create a code block like the above by surrounding the block with three of those backticks (```)

2 Likes

Now throw that code away, and write it in Julia :wink: you will learn a lot I think.

3 Likes

Thank you very much.
What I wrote is a new Monte Carlo parametric expectation maximization method which mainly uses Metropolis. It needs to do Monte Carlo in each iterations. Therefore the speed of MC during each iteration is very crucial. My method is fast is because I can sample both discrete and continuous variables at the same time accurately. Therefore I do not need to loop over discrete variables so it is significantly faster than other Monte Carlo parametric expectation maximization method on the market, and did not sacrifice accuracy and robustness. Similar techniques has been used in quantum Monte Carlo calculations in nuclear physics and condensed matter physics like Ising model I think.

Thank you very much indeed for your suggestions about Turing.jl, it looks like based on Hamiltonian Monte Carlo which requires calculating gradients, therefore one needs to check if the increase of the time cost by calculating the gradients can be compensated by the decrease of the error bar, but nonetheless it can be useful in some cases.

https://turing.ml/dev/docs/library/#Turing.Inference.MH

I’d think Metripolis is the most basic (i.e. old) kind of sampling…

1 Like

Metropolis is just a name, different people do it in different way. It depends on the people who do it and needs to suit the problem.
It is all about designing the best possible proposed move and do corresponding acceptance/rejection, based on detailed balance.
Most of the ‘new’ algorithms, are simply exploring different proposed moves in Metropolis.
Hamiltonian/Hybrid Monte Carlo which was first proposed by Duane et al in lattice field calculations (which typically faces severe sign problem), Hybrid Monte Carlo - ScienceDirect
for example, is still based on Metropolis actually, its novelty is that it uses Liouville’s theorem to proposed the move of a coordinate-momentum pair (x, p), instead of merely the x. If I remember correctly, it uses the p to guide the exploring of x. It can be useful in some cases. However, since it requires calculating the canonical momentum p, it needs to calculate gradients which can be expensive, and also may face the problem of how to calculate the gradient accurately (use central difference, spline, etc). Nonetheless it can be very useful, especially I guess if one want to deal with one gigantic integral.
In fact, in a real robust Monte Carlo simulation, one typically needs to apply different kinds of Metropolis moves, in order to explore the configuration space efficiently and prevent stuck configurations.

2 Likes

Turing already has a heavily-optimized implementation of Metropolis-Hastings. If your own Metropolis-Hastings implementation adds something not in Turing’s implementation, I’d love to see it so we can make a PR!

Also important: Turing is not HMC-based at all. It has HMC, SMC, elliptical slice sampling, MH, variational inference, Gibbs, and pretty much anything else you could ever need. If you’re looking to deal with discrete variables, Gibbs+HMC will probably be the fastest inference algorithm, but you can also try MH. Turing will also provide tons of extra useful diagnostics and plots, like effective sample sizes and R-hat. And it does this all with a big safety net: Turing’s popularity means it’s been thoroughly tested and won’t have any bugs.

4 Likes

Thank you very much.

Well it is nice. But I guess I will not be that certain to say my code ''won’t have any bugs" :stuck_out_tongue_winking_eye:
Especially if it is big package maintained by different people.

I already submit a paper, will let you know once published if you are interested.
We have a package by ourself, now we are evaluating if we go Julia or go modern Fortran, or most probably, keep both versions. The advantage of Julia is the differential equation.jl that we may need to use.
What I am doing now is more like pumas.ai https://pumas.ai/
I am not making general Monte Carlo package like Turing which I believe they did a very nice job.
I am now implementing the best possible Monte Carlo algorithm specialized for our pharmaceutical modeling and algorithms package. The method I am using is not Gibbs nor HMC. Not because they are not good, but because they are not very relevant for the purpose of my problem at the current stage. But I remember this name Turing.jl, and I definitely will keep an eye on it. Thank you very much indeed!

1 Like

Thanks, and good luck! My suggestion is to use Julia, then call Fortran if you ever need to – Julia can call Fortran very easily.

I’m always interested in hearing about any new MCMC methods not implemented in Turing!

Let me refactor a bit

module Step
  using LinearAlgebra
  using DelimitedFiles
  using Printf
  include("Ran.jl")
  using .Ran
  include("Constants.jl")
  using .Constants
  #include("Mixture.jl")
  #using .Mixture
  include("Analysis.jl")
  using .Analysis
  
  export step_init,prep,steps,read_Yji,push_Yji,get_musigma,get_musigma_maxLL,musigma,mean_covar_init
  
  mutable struct MeanCovar # Mean_covar does not follow Julian naming conventions
      mu::Array{Float64,2}
      sigma::Array{Float64,2}
      w::Float64
  end
  
  mutable struct GlobalStateIDontNeed
    musigma::Vector{MeanCovar}
    imode::Int #  is global, make it a Ref to lock the type
    itermax::Int
    msample::Int
    mgauss::Int
    mi::Int
    nsub::Int
    dim_p::Int
    kmix::Int

    sig::Float64
    sig_inv::Float64
    SigV::Float64
    mu1::Float64
    sig1::Float64
    mu2::Float64
    sig2::Float64
    muV::Float64
    w1::Float64
    w2::Float64

    mitot::Int
    mgauss_tot::Int
    mgauss_max::Int
    mgauss_min::Int

    iter::Int

    sigma_factor::Float64
    D::Float64
    normpdf_factor_pYq_i::Float64
    log_normpdf_factor_pYq_i::Float64


    h_diag::Array{Float64,3}
    ymh::Array{Float64,3}
    thetas::Array{Float64,3}


    muk_iter::Matrix{Float64}
    sigk_iter::Matrix{Float64}
    muV_iter::Matrix{Float64}
    sigV_iter::Matrix{Float64}
    wk_iter::Matrix{Float64}
    mukerror_iter::Matrix{Float64}
    sigkerror_iter::Matrix{Float64}
    ar_gik_k::Matrix{Float64}
    muVerror_iter::Matrix{Float64}
    sigVerror_iter::Matrix{Float64}
    log_pYq_km::Matrix{Float64}
    nik::Matrix{Float64}
    nik_sig::Matrix{Float64}
    wknik::Matrix{Float64}
    wknik_sig::Matrix{Float64}
    tauik::Matrix{Float64}
    Yji::Matrix{Float64}

    mgauss_ik::Matrix{Int}
    isub_Metroplis_gik_all::Matrix{Int}

    LL_iter::Vector{Float64}
    sigma_iter::Vector{Float64}
    ar_gik::Vector{Float64}
    sigmaerror_iter::Vector{Float64}
    norm::Vector{Float64}
    norm_sig::Vector{Float64}
    normerr::Vector{Float64}
    log_norm::Vector{Float64}
    wnow::Vector{Float64}
    log_wnow::Vector{Float64}
    t::Vector{Float64}

    fmt_k_iter::Any
    fmt_float_iter::Any
    fmt_float_err_iter::Any
end

function step_init(mean_cov_in::Vector{MeanCovar},imodein::Int64,itermaxin::Int64,min::Int64,mgaussin::Int64,miin::Int64,nsubin::Int64,pin::Int64,kmixin::Int64
    ,mu1in::Float64,sig1in::Float64,mu2in::Float64,sig2in::Float64
    ,muVin::Float64,sigVin::Float64 
    ,w1in::Float64,w2in::Float64,sigin::Float64
    ,Din::Float64,tin::Array{Float64,1})
    @assert length(tin) == miin
    state = GlobalStateIDontNeed(
        mean_cov_in,
        imodein,
        itermaxin,
        miin,
        nsubin,
        ...
    )

    ... # proceed with initialization
    return state
end
const GLOBAL_STATE_TO_GET_RID_OF = step_init(#=some arguments=#)

voilà, now there’s only one global constant.
Of course, now you need to type

GLOBAL_STATE_TO_GET_RID_OF.imode = ...

instead of

imode[] = ...

but that makes the code much easier to understand for a person who did not write it.

Now, global Refs are not zero-cost. You get an extra indirection each time you access the data.
Having MeanCovar as a mutable struct also has costs.

You see, Julia is a pretty bad Fortran, but is a good Julia language. So, you absolutely should not write Fortran in Julia and must not base your conclusions about overall Julia performance on the performance of Fortran code written in Julia. If you’d like to write Fortran, it’s best to use a Fortran compiler to build your applications because doing it with the Julia compiler won’t be any more performant or easier to maintain and extend.

And, well, I have not seen anything “modern” in your Fortran code posted earlier, so your references to “modern Fortran” are way off IMO. That the code uses = instead of .eq. etc. does not make it modern. Getting rid of programming antipatterns stemming from 60’s when computers had 1000 words of memory does. Even then, avoiding globals and using immutable state if possible are not new patterns either, they are advocated in “Structure and Interpretation of Computer Programs”, a 1985 book!

The answer to the topic title is, IMO:

  • for the same algorithm, Julia is on par with Fortran, modern or not
  • for the same Fortran code, Julia is likely to be several times slower because it has not been designed to compile Fortran code efficiently
  • for the same Julia code, Julia is likely to be orders of magnitude faster than Fortran because may spend years to re-implement in Fortran what can be done in days in Julia
9 Likes

@CRquantum

how long is the portion of your code that actually does the real algorithm?

You can define a macro to effortlessly unpack a large number of values at once from a single object.

Sample code:

struct Foo{A, B, C} a::A; b::B; c::C end

"""
`@unpackall_Foo(obj)` unpacks all fields of the object `obj` of type `Foo`.
"""
macro unpackall_Foo(obj)
    names = fieldnames(Foo)
    Expr(:(=),
        Expr(:tuple, names...),
        Expr(:tuple, (:($obj.$name) for name in names)...)
    ) |> esc
end

@doc @unpackall_Foo

@unpackall_Foo(obj) unpacks all fields of the object obj of type Foo .

Input:

@macroexpand @unpackall_Foo foo

Output:

:((a, b, c) = (foo.a, foo.b, foo.c))

Input:

@unpackall_Foo Foo(1, 2.0, "three")
a, b, c

Output:

(1, 2.0, "three")

With this macro, there is no need to share a large number of global constants between multiple functions.

Just pass the object foo, which contains the values shared between multiple functions, to a function as an argument each time, and unpack it in the function.

Input:

foo = Foo(1, 2.0, "three")

function f(foo::Foo)
    @unpackall_Foo foo
    @show a b c
    return
end

f(foo)

Output:

a = 1
b = 2.0
c = "three"

In the above, we unpack only a, b, and c from foo, but you can increase the number of them as much as you want.

To increase the number of fields in the style of struct Foo{A, B, C} a::A; b::B; c::C end as much as you want, you can use ConcreteStructs.jl (my example).

Jupyter notebook: https://github.com/genkuroki/public/blob/main/0018/How%20to%20define%20unpacking%20macros.ipynb

3 Likes