Any benchmark of Julia v1.0 vs older versions

Seif_Shebl · August 16, 2018, 12:00am

@Juan, you ask about the future? I have no idea about the future, but I will show you two examples from the present where Julia easily beats C and Fortran (by C and Fortran I mean latest GCC compilers because languages themselves are irrelevant here). My dear friend, Julia toppled them all from the ring, still only one old stubborn player who refuses to die, Intel Fortran. The first example is a real-world one from A Comparison of Programming Languages in Economics and the second one is a toy example. I have many of these, but for the sake of brevity, I will show only two.

In the following examples, I will show Fortran and Julia codes. For compiling the Fortran code, all possible aggressive optimizations (for the GCC 8.1.0 compiler) were turned on via -Ofast and Julia is compiled with the -O3 --check-bounds=no options. Julia runs 2X faster and 1.5X faster in the examples below, respectively.

Here are the results first, and I will leave you with the code.
First example:

Elapsed time is   1.62500000                                      # Fortran                  
0.802962 seconds (1.11 k allocations: 353.815 MiB, 3.81% gc time) # Julia

Second example:

Time (sec): 0.5160000                       # Fortran
0.352425 seconds (4 allocations: 160 bytes) # Julia

Now Fortran code (Example 1):

!============================================================================
! Name        : RBC_F90.f90
! Description : Basic RBC model with full depreciation, Intel compiler
! Date        : July 21, 2013
!============================================================================

program RBC_F90
  !----------------------------------------------------------------
  ! 0. variables to be defined
  !----------------------------------------------------------------

  implicit none

  integer, parameter :: nGridCapital = 17820
  integer, parameter :: nGridProductivity = 5
  real(8), parameter :: tolerance = 0.0000001

  integer :: nCapital, nCapitalNextPeriod, gridCapitalNextPeriod, nProductivity, nProductivityNextPeriod
  integer :: iteration

  real    :: start_time, stop_time
  real(8) :: aalpha, bbeta, capitalSteadyState, outputSteadyState, consumptionSteadyState
  real(8) :: valueHighSoFar, valueProvisional, consumption, capitalChoice
  real(8) :: maxDifference,diff,diffHighSoFar

  real(8), dimension(nGridProductivity) :: vProductivity
  real(8), dimension(nGridProductivity,nGridProductivity) :: mTransition
  real(8), dimension(nGridCapital) :: vGridCapital
  real(8), dimension(nGridCapital,nGridProductivity) :: mOutput, mValueFunction, mValueFunctionNew, mPolicyFunction
  real(8), dimension(nGridCapital,nGridProductivity) :: expectedValueFunction

  !----------------------------------------------------------------
  ! 1. Calibration
  !----------------------------------------------------------------
  call cpu_time(start_time)

  aalpha = 1.d0/3  ! Elasticity of output w.r.t
  bbeta  = 0.95d0  ! Discount factor

  ! Productivity value
  vProductivity = (/0.9792, 0.9896, 1.0000, 1.0106, 1.0212/)

  ! Transition matrix
  mTransition = reshape( (/0.9727, 0.0273, 0.0,    0.0,    0.0, &
                           0.0041, 0.9806, 0.0153, 0.0,    0.0, &
                           0.0,    0.0082, 0.9837, 0.0082, 0.0, &
                           0.0,    0.0,    0.0153, 0.9806, 0.0041, &
                           0.0,    0.0,    0.0,    0.0273, 0.9727 /), (/5,5/))

  mTransition = transpose(mTransition)

  !----------------------------------------------------------------
  ! 2. Steady state
  !----------------------------------------------------------------
  capitalSteadyState     = (aalpha*bbeta)**(1.0/(1.0-aalpha))
  outputSteadyState      = capitalSteadyState**(aalpha)
  consumptionSteadyState = outputSteadyState-capitalSteadyState

  print *, 'Steady State values'
  print *, 'Output: ', outputSteadyState, 'Capital: ', capitalSteadyState, 'Consumption: ', consumptionSteadyState

  ! Grid for capital
  do nCapital = 1, nGridCapital
     vGridCapital(nCapital) = 0.5*capitalSteadyState + 0.00001d0*(nCapital-1)
  end do

  !----------------------------------------------------------------
  ! 3. Pre-build Output for each point in the grid
  !----------------------------------------------------------------
  do nProductivity = 1, nGridProductivity
     do nCapital = 1, nGridCapital
        mOutput(nCapital, nProductivity) = vProductivity(nProductivity)*(vGridCapital(nCapital)**aalpha)
     end do
  end do

  !----------------------------------------------------------------
  ! 4. Main Iteration
  !----------------------------------------------------------------
  maxDifference = 10.0
  iteration     = 0

  do while (maxDifference > tolerance)

     expectedValueFunction = matmul(mValueFunction,transpose(mTransition));

     do nProductivity = 1,nGridProductivity

        ! We start from previous choice (monotonicity of policy function)

        gridCapitalNextPeriod = 1

        do nCapital = 1,nGridCapital

           valueHighSoFar = -100000.0

           do nCapitalNextPeriod = gridCapitalNextPeriod,nGridCapital

              consumption = mOutput(nCapital,nProductivity)-vGridCapital(nCapitalNextPeriod)
              valueProvisional = (1.0-bbeta)*log(consumption)+bbeta*expectedValueFunction(nCapitalNextPeriod,nProductivity)

              if (valueProvisional>valueHighSoFar) then ! we break when we have achieved the max
                 valueHighSoFar = valueProvisional
                 capitalChoice = vGridCapital(nCapitalNextPeriod)
                 gridCapitalNextPeriod = nCapitalNextPeriod
              else
                 exit
              end if

           end do

           mValueFunctionNew(nCapital,nProductivity) = valueHighSoFar
           mPolicyFunction(nCapital,nProductivity)   = capitalChoice

        end do

     end do

     maxDifference = maxval((abs(mValueFunctionNew-mValueFunction)))
     mValueFunction = mValueFunctionNew

     iteration = iteration + 1
     ! if (mod(iteration,10)==0 .OR. iteration==1) then
     !    print *, 'Iteration:', iteration, 'Sup Diff:', MaxDifference
     ! end if
  end do
  !----------------------------------------------------------------
  ! 5. PRINT RESULTS
  !----------------------------------------------------------------

  print *, 'Iteration:', iteration, 'Sup Diff:', MaxDifference, char(10)
  print *, 'My check:', mPolicyFunction(1000,3)

  call cpu_time(stop_time)
  print *, 'Elapsed time is', stop_time - start_time

end program RBC_F90
! ==================================================== !
! ---------------------- OUTPUT ---------------------- ! 
! ==================================================== !
 Steady State values
 Output:   0.56273143387113778      Capital:   0.17819828739252699      Consumption:   0.38453314647861081
 Iteration:         257 Sup Diff:   9.7160058332157462E-008

 My check:  0.14654914369626351
 Elapsed time is   1.62500000

Julia code (Example 1):

function main()
    ##  1. Calibration
    α = 1/3    # Elasticity of output w.r.t. capital
    β = 0.95   # Discount factor num_triangles

    # Productivity values
    vProductivity = [0.9792 0.9896 1.0000 1.0106 1.0212]

    # Transition matrix
    mTransition   = [0.9727 0.0273 0.0000 0.0000 0.0000
                     0.0041 0.9806 0.0153 0.0000 0.0000
                     0.0000 0.0082 0.9837 0.0082 0.0000
                     0.0000 0.0000 0.0153 0.9806 0.0041
                     0.0000 0.0000 0.0000 0.0273 0.9727]
    mTransition = mTransition' # CHANGE: do this here, not inside the loop
    # 2. Steady State

    capitalSteadyState = (α*β)^(1.0/(1.0-α))
    outputSteadyState = capitalSteadyState^α
    consumptionSteadyState = outputSteadyState-capitalSteadyState

    # println("Output = ",outputSteadyState," Capital = ",capitalSteadyState," Consumption = ",consumptionSteadyState)

    # We generate the grid of capital
    vGridCapital = collect(0.5*capitalSteadyState:0.00001:1.5*capitalSteadyState)
    nGridCapital = length(vGridCapital)
    nGridProductivity = length(vProductivity)

    # 3. Required matrices and vectors
    mOutput           = zeros(nGridCapital,nGridProductivity)
    mValueFunction    = zeros(nGridCapital,nGridProductivity)
    mValueFunctionNew = zeros(nGridCapital,nGridProductivity)
    mPolicyFunction   = zeros(nGridCapital,nGridProductivity)
    expectedValueFunction = zeros(nGridCapital,nGridProductivity)

    # 4. We pre-build output for each point in the grid
    mOutput = (vGridCapital.^α)*vProductivity

    # 5. Main iteration
    maxDifference = 10.0
    tolerance = 0.0000001
    iteration = 0

    while (maxDifference > tolerance)
        expectedValueFunction = mValueFunction*mTransition

        for nProductivity in 1:nGridProductivity

            # We start from previous choice (monotonicity of policy function)
            gridCapitalNextPeriod = 1
            for nCapital in 1:nGridCapital

                valueHighSoFar = -1000.0
                capitalChoice  = vGridCapital[1]

                for nCapitalNextPeriod in gridCapitalNextPeriod:nGridCapital

                    consumption = mOutput[nCapital,nProductivity]-vGridCapital[nCapitalNextPeriod]
                    valueProvisional = (1-β)*log(consumption)+β*expectedValueFunction[nCapitalNextPeriod,nProductivity]

                    if (valueProvisional>valueHighSoFar)
                         valueHighSoFar = valueProvisional
                         capitalChoice = vGridCapital[nCapitalNextPeriod]
                         gridCapitalNextPeriod = nCapitalNextPeriod
                    else
                         break # We break when we have achieved the max
                    end                
                end

                mValueFunctionNew[nCapital,nProductivity] = valueHighSoFar
                mPolicyFunction[nCapital,nProductivity] = capitalChoice

            end

        end
        
        maxDifference = maximum(abs,(mValueFunctionNew-mValueFunction))
        mValueFunction, mValueFunctionNew = mValueFunctionNew, mValueFunction
        
        iteration += 1
        # if mod(iteration,10)==0 || iteration == 1
        #     println(" Iteration = ", iteration, " Sup Diff = ", maxDifference)
        # end

    end

    println(" Iteration = ", iteration, " Sup Diff = ", maxDifference)
    println(" ")
    println(" My check = ", mPolicyFunction[1000,3])
    println(" My check = ", mValueFunction[1000,3])

end

# =============================================== #
# -------------------- OUTPUT ------------------- #
# =============================================== #

julia> main()
julia> @time main()
 Iteration = 257 Sup Diff = 9.716035653806188e-8

 My check = 0.1465491436962635
 My check = -0.9714880021887345
 Iteration = 257 Sup Diff = 9.716035653806188e-8

 My check = 0.1465491436962635
 My check = -0.9714880021887345
  0.802962 seconds (1.11 k allocations: 353.815 MiB, 3.81% gc time)

Fortran code (Example 2):

program test_prod
    implicit none
    integer, parameter :: n = 10**8
    double precision, allocatable :: x(:), y(:)
    integer :: t0, count_rate, count_max, t1

    allocate ( x(n), y(n) )
    call random_number(x)

    call system_clock(t0,count_rate,count_max)
    call toarr(x,y)
    call system_clock(t1)

    write (*,'(a,f9.7,a,5f9.5)') 'Time (sec): ',real(t1-t0)/count_rate, char(10), y(1:5)

contains

    subroutine toarr(x,y)
    double precision, intent(in) :: x(:)
    double precision, intent(out):: y(:)
      y = x*x
      where (x <= 0.3) y = sqrt(x)
    end subroutine

end program test_prod

! ==================================================== !
! ---------------------- OUTPUT ---------------------- ! 
! ==================================================== !

Time (sec): 0.5160000
  0.19949  0.94330  0.11662  0.37159  0.15961

Julia code (Example 2):

function toarr!(x,y)
  v = x[1]
  for (i,v) in enumerate(x)
    y[i] = ifelse(v > 0.3, v*v, sqrt(v))
  end  
end

# =============================================== #
# -------------------- OUTPUT ------------------- #
# =============================================== #
julia> x = rand(10^8);
julia> y = zeros(10^8);
julia> @time toarr!(x,y)
  0.352425 seconds (4 allocations: 160 bytes)
julia> println(y[1:5])
[0.193422, 0.176242, 0.520951, 0.63757, 0.658653]

Elrod · August 16, 2018, 1:02am

I think so. Julia already has libraries like https://github.com/JuliaMath/IterativeSolvers.jl
There’s also a lot of work done in Julia with tensors, eg
For Loops 2.0: Index Notation And The Future Of Tensor Compilers | Peter Ahrens - YouTube
so I think it is reasonable to expect one of these projects to eventually begin implementing their own BLAS.

For what it’s worth, I’ll likely keep adding features to my own project as I want them until something faster comes along. But that wont lead to a very complete library any time soon, so I wouldn’t count on it becoming a practical solution.

Juan · August 16, 2018, 1:22am

I know we don’t have to reinvent the wheel but having BLAS and MKL in pure Julia would make Julia completely independent from other languages.

Elrod · August 16, 2018, 1:35am

I think it would be great to have building blocks set up with a convenient API, so that we can compose more algorithms than just those offered by BLAS. For example, Strassen’s algorithm has operations like (A-B)*(X+Y), and we also get similar operations from inverting an affine transformation A^-1(X-Y) ,which is a pretty common operation in statistics. Being able to fuse those operations would be much faster. Classic gem only gives us D += A * X.
It could also let us interweave other calculations while we already have certain parts of the matrix in memory.

More interesting, is that perhaps tools like StructOfArrays can be combined with it to allow for more efficient multiplication of complex matrices; we could easily load separate SIMD vectors of real and complex numbers.
I’d also like to see something like that with ForwardDiff’s dual numbers too. But I’m also hoping that later in Julia 1.x, when some necessary compiler optimizations get implemented and Capstan gets released, it just blows ForwardDiff away. Not sure what to expect on that front.

RoyiAvital · August 16, 2018, 5:52am

John_Gibson:

FWIW, here are microbenchmark results for julia-0.6.4, 0.7.0, and 1.0.0. There’s some improvement in matrix_statistics and recursion_fibonacci and some degradation in parse_integers and print_to_file.

0.6.4 0.7.0 1.0.0

iteration_pi_sum 27.37 27.67 27.66

matrix_multiply 70.24 70.22 70.32

matrix_statistics 8.513 7.286 7.323

parse_integers 0.132 0.221 0.218

print_to_file 6.860 10.833 10.870

recursion_fibonacci 0.0406 0.0302 0.0302

recursion_quicksort 0.248 0.261 0.259

userfunc_mandelbrot 0.0565 0.0527 0.0527

We need to expand this test and include at least ~10 more micro benchmarks.
Few ideas:

Newton iterations to find a root.
Projecting an array to the unit simplex.
Creating the Dynamic Time Warping Table between 2 vectors (We could also add few other Dynamic Programming cases).

Probably people with CS background could few more (Searching in a tree? Implementing some data structure?).

ChrisRackauckas · August 16, 2018, 7:09am

We don’t need more benchmark ideas. People have many of those. We need more benchmarks which are implemented in every single language we’re testing in since otherwise it won’t be shown. This latter part, getting someone to implement it in language x != Julia is the hard part where most ideas stay.

RoyiAvital · August 16, 2018, 7:34am

I would start with putting a Julia reference.
Then others will fill other languages.

I will be able to submit MATLAB / Python (Numba as well) and if the benchmark is simple I will be able to handle C.
I guess finding people for LuaJit and R will be simple as well.

Those benchmark aren’t only for advertising, they are reference point for those who are responsible keeping Julia fast.

But each starts with some Julia reference code and tests which makes sense though are simple and compact.

kristoffer.carlsson · August 16, 2018, 7:39am

We use https://github.com/JuliaCI/BaseBenchmarks.jl/tree/master/src for that. The plot is just for marketing purposes.

RoyiAvital · August 16, 2018, 7:55am

First, this is great to see there is a broad reference point, didn’t expect less.
But those are only implemented in Julia.
They don’t give you a reference versus other competitors.

It’s like being a professional sprinter.
You both check your performance against yourself (In house training) and against others (Competitions).

Juan · August 16, 2018, 5:57pm

Hello.

Where can we see the list of packages compatible with Julia v1.0?

When I check any package I can only see if works with v0.6 and v0.7 but nothing about v1.0.

And where is the “Julia Language Documentation” pdf for Julia v1.0?

ufechner7 · August 16, 2018, 6:51pm

First of all, your question do not belong into this thread.
Nevertheless two answers:

there is currently no list of packages compatible with 1.0
there is currently no documentation available in pdf format.
Please search for other threads where these issues are discussed.
Uwe

kristoffer.carlsson · August 16, 2018, 6:53pm

Here’s a pdf version http://mortenpi.eu/julia/julia-1.0.0-manual.pdf

ufechner7 · August 16, 2018, 6:57pm

Nice! Did not expect this so fast.

StefanKarpinski · August 16, 2018, 7:31pm

Choosing to benchmark operations on small matrices was intentional since it’s actually hard for the language layer as compared to large matmul which just measures BLAS speed, which is pretty uninteresting from a language perspective. There’s also a large matmul benchmark for those who really insist on seeing that. Of course, the only thing interesting that it shows is that some languages manage to make matmul slow even though they are calling BLAS.

StefanKarpinski · August 16, 2018, 7:40pm

This is optimistic . But I guess it might happen…

It’s very useful because we’re interested in the changes to our performance over time which is the primary purpose of these benchmarks. Comparing to a bunch of other languages is mostly only useful, as @kristoffer.carlsson says, for marketing. For ourselves we don’t need to compare against all of these other languages languages, we just need to make sure we’re as fast as we can be, which is generally well estimated by comparison to to C/Fortran speed.

NiclasMattsson · August 16, 2018, 8:31pm

Just out of curiosity: what is the worst case performance of Julia 1.0 compared to C? Suppose we were in a masochistic mood and decided to design a new microbenchmark specifically to make Julia look as bad as possible compared to C. What would that benchmark test, and how bad do you think that relative performance would be?

Juan · August 16, 2018, 9:49pm

Technically speaking…
What Julia’s areas have more room for improvement in the near future?
What Julia’s areas are the subject of more research and involves more people?

Elrod · August 16, 2018, 10:26pm

Code where type inference fails.

Juan · August 16, 2018, 11:08pm

but apart from that,
What’s the roadmap?

kristoffer.carlsson · August 17, 2018, 8:44am

Better multithreading model.

Topic		Replies	Views
Benchmark MATLAB & Julia for Matrix Operations Performance	148	19813	October 15, 2019
OpenBLAS is faster than Intel MKL on AMD Hardware (Ryzen) Performance blas , lapack	40	36480	June 19, 2020
Benchmark for latest julia? Community question	126	14823	April 1, 2019
Intel C/C++ compiler performance versus Julia Offtopic	20	6253	August 11, 2021
OpenBLAS: Julia slower than R Performance linearalgebra	41	7839	March 26, 2019

	0.6.4	0.7.0	1.0.0
iteration_pi_sum	27.37	27.67	27.66
matrix_multiply	70.24	70.22	70.32
matrix_statistics	8.513	7.286	7.323
parse_integers	0.132	0.221	0.218
print_to_file	6.860	10.833	10.870
recursion_fibonacci	0.0406	0.0302	0.0302
recursion_quicksort	0.248	0.261	0.259
userfunc_mandelbrot	0.0565	0.0527	0.0527

Any benchmark of Julia v1.0 vs older versions

Related topics