Any benchmark of Julia v1.0 vs older versions

@Juan, you ask about the future? I have no idea about the future, but I will show you two examples from the present where Julia easily beats C and Fortran (by C and Fortran I mean latest GCC compilers because languages themselves are irrelevant here). My dear friend, Julia toppled them all from the ring, still only one old stubborn player who refuses to die, Intel Fortran. The first example is a real-world one from A Comparison of Programming Languages in Economics and the second one is a toy example. I have many of these, but for the sake of brevity, I will show only two.

In the following examples, I will show Fortran and Julia codes. For compiling the Fortran code, all possible aggressive optimizations (for the GCC 8.1.0 compiler) were turned on via -Ofast and Julia is compiled with the -O3 --check-bounds=no options. Julia runs 2X faster and 1.5X faster in the examples below, respectively.

Here are the results first, and I will leave you with the code.
First example:

Elapsed time is   1.62500000                                      # Fortran                  
0.802962 seconds (1.11 k allocations: 353.815 MiB, 3.81% gc time) # Julia

Second example:

Time (sec): 0.5160000                       # Fortran
0.352425 seconds (4 allocations: 160 bytes) # Julia

Now Fortran code (Example 1):

!============================================================================
! Name        : RBC_F90.f90
! Description : Basic RBC model with full depreciation, Intel compiler
! Date        : July 21, 2013
!============================================================================

program RBC_F90
  !----------------------------------------------------------------
  ! 0. variables to be defined
  !----------------------------------------------------------------

  implicit none

  integer, parameter :: nGridCapital = 17820
  integer, parameter :: nGridProductivity = 5
  real(8), parameter :: tolerance = 0.0000001

  integer :: nCapital, nCapitalNextPeriod, gridCapitalNextPeriod, nProductivity, nProductivityNextPeriod
  integer :: iteration

  real    :: start_time, stop_time
  real(8) :: aalpha, bbeta, capitalSteadyState, outputSteadyState, consumptionSteadyState
  real(8) :: valueHighSoFar, valueProvisional, consumption, capitalChoice
  real(8) :: maxDifference,diff,diffHighSoFar

  real(8), dimension(nGridProductivity) :: vProductivity
  real(8), dimension(nGridProductivity,nGridProductivity) :: mTransition
  real(8), dimension(nGridCapital) :: vGridCapital
  real(8), dimension(nGridCapital,nGridProductivity) :: mOutput, mValueFunction, mValueFunctionNew, mPolicyFunction
  real(8), dimension(nGridCapital,nGridProductivity) :: expectedValueFunction

  !----------------------------------------------------------------
  ! 1. Calibration
  !----------------------------------------------------------------
  call cpu_time(start_time)

  aalpha = 1.d0/3  ! Elasticity of output w.r.t
  bbeta  = 0.95d0  ! Discount factor

  ! Productivity value
  vProductivity = (/0.9792, 0.9896, 1.0000, 1.0106, 1.0212/)

  ! Transition matrix
  mTransition = reshape( (/0.9727, 0.0273, 0.0,    0.0,    0.0, &
                           0.0041, 0.9806, 0.0153, 0.0,    0.0, &
                           0.0,    0.0082, 0.9837, 0.0082, 0.0, &
                           0.0,    0.0,    0.0153, 0.9806, 0.0041, &
                           0.0,    0.0,    0.0,    0.0273, 0.9727 /), (/5,5/))

  mTransition = transpose(mTransition)

  !----------------------------------------------------------------
  ! 2. Steady state
  !----------------------------------------------------------------
  capitalSteadyState     = (aalpha*bbeta)**(1.0/(1.0-aalpha))
  outputSteadyState      = capitalSteadyState**(aalpha)
  consumptionSteadyState = outputSteadyState-capitalSteadyState

  print *, 'Steady State values'
  print *, 'Output: ', outputSteadyState, 'Capital: ', capitalSteadyState, 'Consumption: ', consumptionSteadyState

  ! Grid for capital
  do nCapital = 1, nGridCapital
     vGridCapital(nCapital) = 0.5*capitalSteadyState + 0.00001d0*(nCapital-1)
  end do

  !----------------------------------------------------------------
  ! 3. Pre-build Output for each point in the grid
  !----------------------------------------------------------------
  do nProductivity = 1, nGridProductivity
     do nCapital = 1, nGridCapital
        mOutput(nCapital, nProductivity) = vProductivity(nProductivity)*(vGridCapital(nCapital)**aalpha)
     end do
  end do

  !----------------------------------------------------------------
  ! 4. Main Iteration
  !----------------------------------------------------------------
  maxDifference = 10.0
  iteration     = 0

  do while (maxDifference > tolerance)

     expectedValueFunction = matmul(mValueFunction,transpose(mTransition));

     do nProductivity = 1,nGridProductivity

        ! We start from previous choice (monotonicity of policy function)

        gridCapitalNextPeriod = 1

        do nCapital = 1,nGridCapital

           valueHighSoFar = -100000.0

           do nCapitalNextPeriod = gridCapitalNextPeriod,nGridCapital

              consumption = mOutput(nCapital,nProductivity)-vGridCapital(nCapitalNextPeriod)
              valueProvisional = (1.0-bbeta)*log(consumption)+bbeta*expectedValueFunction(nCapitalNextPeriod,nProductivity)

              if (valueProvisional>valueHighSoFar) then ! we break when we have achieved the max
                 valueHighSoFar = valueProvisional
                 capitalChoice = vGridCapital(nCapitalNextPeriod)
                 gridCapitalNextPeriod = nCapitalNextPeriod
              else
                 exit
              end if

           end do

           mValueFunctionNew(nCapital,nProductivity) = valueHighSoFar
           mPolicyFunction(nCapital,nProductivity)   = capitalChoice

        end do

     end do

     maxDifference = maxval((abs(mValueFunctionNew-mValueFunction)))
     mValueFunction = mValueFunctionNew

     iteration = iteration + 1
     ! if (mod(iteration,10)==0 .OR. iteration==1) then
     !    print *, 'Iteration:', iteration, 'Sup Diff:', MaxDifference
     ! end if
  end do
  !----------------------------------------------------------------
  ! 5. PRINT RESULTS
  !----------------------------------------------------------------

  print *, 'Iteration:', iteration, 'Sup Diff:', MaxDifference, char(10)
  print *, 'My check:', mPolicyFunction(1000,3)

  call cpu_time(stop_time)
  print *, 'Elapsed time is', stop_time - start_time

end program RBC_F90
! ==================================================== !
! ---------------------- OUTPUT ---------------------- ! 
! ==================================================== !
 Steady State values
 Output:   0.56273143387113778      Capital:   0.17819828739252699      Consumption:   0.38453314647861081
 Iteration:         257 Sup Diff:   9.7160058332157462E-008

 My check:  0.14654914369626351
 Elapsed time is   1.62500000

Julia code (Example 1):

function main()
    ##  1. Calibration
    Ī± = 1/3    # Elasticity of output w.r.t. capital
    Ī² = 0.95   # Discount factor num_triangles

    # Productivity values
    vProductivity = [0.9792 0.9896 1.0000 1.0106 1.0212]

    # Transition matrix
    mTransition   = [0.9727 0.0273 0.0000 0.0000 0.0000
                     0.0041 0.9806 0.0153 0.0000 0.0000
                     0.0000 0.0082 0.9837 0.0082 0.0000
                     0.0000 0.0000 0.0153 0.9806 0.0041
                     0.0000 0.0000 0.0000 0.0273 0.9727]
    mTransition = mTransition' # CHANGE: do this here, not inside the loop
    # 2. Steady State

    capitalSteadyState = (Ī±*Ī²)^(1.0/(1.0-Ī±))
    outputSteadyState = capitalSteadyState^Ī±
    consumptionSteadyState = outputSteadyState-capitalSteadyState

    # println("Output = ",outputSteadyState," Capital = ",capitalSteadyState," Consumption = ",consumptionSteadyState)

    # We generate the grid of capital
    vGridCapital = collect(0.5*capitalSteadyState:0.00001:1.5*capitalSteadyState)
    nGridCapital = length(vGridCapital)
    nGridProductivity = length(vProductivity)

    # 3. Required matrices and vectors
    mOutput           = zeros(nGridCapital,nGridProductivity)
    mValueFunction    = zeros(nGridCapital,nGridProductivity)
    mValueFunctionNew = zeros(nGridCapital,nGridProductivity)
    mPolicyFunction   = zeros(nGridCapital,nGridProductivity)
    expectedValueFunction = zeros(nGridCapital,nGridProductivity)

    # 4. We pre-build output for each point in the grid
    mOutput = (vGridCapital.^Ī±)*vProductivity

    # 5. Main iteration
    maxDifference = 10.0
    tolerance = 0.0000001
    iteration = 0

    while (maxDifference > tolerance)
        expectedValueFunction = mValueFunction*mTransition

        for nProductivity in 1:nGridProductivity

            # We start from previous choice (monotonicity of policy function)
            gridCapitalNextPeriod = 1
            for nCapital in 1:nGridCapital

                valueHighSoFar = -1000.0
                capitalChoice  = vGridCapital[1]

                for nCapitalNextPeriod in gridCapitalNextPeriod:nGridCapital

                    consumption = mOutput[nCapital,nProductivity]-vGridCapital[nCapitalNextPeriod]
                    valueProvisional = (1-Ī²)*log(consumption)+Ī²*expectedValueFunction[nCapitalNextPeriod,nProductivity]

                    if (valueProvisional>valueHighSoFar)
                         valueHighSoFar = valueProvisional
                         capitalChoice = vGridCapital[nCapitalNextPeriod]
                         gridCapitalNextPeriod = nCapitalNextPeriod
                    else
                         break # We break when we have achieved the max
                    end                
                end

                mValueFunctionNew[nCapital,nProductivity] = valueHighSoFar
                mPolicyFunction[nCapital,nProductivity] = capitalChoice

            end

        end
        
        maxDifference = maximum(abs,(mValueFunctionNew-mValueFunction))
        mValueFunction, mValueFunctionNew = mValueFunctionNew, mValueFunction
        
        iteration += 1
        # if mod(iteration,10)==0 || iteration == 1
        #     println(" Iteration = ", iteration, " Sup Diff = ", maxDifference)
        # end

    end

    println(" Iteration = ", iteration, " Sup Diff = ", maxDifference)
    println(" ")
    println(" My check = ", mPolicyFunction[1000,3])
    println(" My check = ", mValueFunction[1000,3])

end

# =============================================== #
# -------------------- OUTPUT ------------------- #
# =============================================== #

julia> main()
julia> @time main()
 Iteration = 257 Sup Diff = 9.716035653806188e-8

 My check = 0.1465491436962635
 My check = -0.9714880021887345
 Iteration = 257 Sup Diff = 9.716035653806188e-8

 My check = 0.1465491436962635
 My check = -0.9714880021887345
  0.802962 seconds (1.11 k allocations: 353.815 MiB, 3.81% gc time)

Fortran code (Example 2):

program test_prod
    implicit none
    integer, parameter :: n = 10**8
    double precision, allocatable :: x(:), y(:)
    integer :: t0, count_rate, count_max, t1

    allocate ( x(n), y(n) )
    call random_number(x)

    call system_clock(t0,count_rate,count_max)
    call toarr(x,y)
    call system_clock(t1)

    write (*,'(a,f9.7,a,5f9.5)') 'Time (sec): ',real(t1-t0)/count_rate, char(10), y(1:5)

contains

    subroutine toarr(x,y)
    double precision, intent(in) :: x(:)
    double precision, intent(out):: y(:)
      y = x*x
      where (x <= 0.3) y = sqrt(x)
    end subroutine

end program test_prod

! ==================================================== !
! ---------------------- OUTPUT ---------------------- ! 
! ==================================================== !

Time (sec): 0.5160000
  0.19949  0.94330  0.11662  0.37159  0.15961

Julia code (Example 2):

function toarr!(x,y)
  v = x[1]
  for (i,v) in enumerate(x)
    y[i] = ifelse(v > 0.3, v*v, sqrt(v))
  end  
end

# =============================================== #
# -------------------- OUTPUT ------------------- #
# =============================================== #
julia> x = rand(10^8);
julia> y = zeros(10^8);
julia> @time toarr!(x,y)
  0.352425 seconds (4 allocations: 160 bytes)
julia> println(y[1:5])
[0.193422, 0.176242, 0.520951, 0.63757, 0.658653]
5 Likes

I think so. Julia already has libraries like https://github.com/JuliaMath/IterativeSolvers.jl
Thereā€™s also a lot of work done in Julia with tensors, eg
For Loops 2.0: Index Notation And The Future Of Tensor Compilers | Peter Ahrens - YouTube
so I think it is reasonable to expect one of these projects to eventually begin implementing their own BLAS.

For what itā€™s worth, Iā€™ll likely keep adding features to my own project as I want them until something faster comes along. But that wont lead to a very complete library any time soon, so I wouldnā€™t count on it becoming a practical solution.

I know we donā€™t have to reinvent the wheel but having BLAS and MKL in pure Julia would make Julia completely independent from other languages.

I think it would be great to have building blocks set up with a convenient API, so that we can compose more algorithms than just those offered by BLAS. For example, Strassenā€™s algorithm has operations like (A-B)*(X+Y), and we also get similar operations from inverting an affine transformation A^-1(X-Y) ,which is a pretty common operation in statistics. Being able to fuse those operations would be much faster. Classic gem only gives us D += A * X.
It could also let us interweave other calculations while we already have certain parts of the matrix in memory.

More interesting, is that perhaps tools like StructOfArrays can be combined with it to allow for more efficient multiplication of complex matrices; we could easily load separate SIMD vectors of real and complex numbers.
Iā€™d also like to see something like that with ForwardDiffā€™s dual numbers too. But Iā€™m also hoping that later in Julia 1.x, when some necessary compiler optimizations get implemented and Capstan gets released, it just blows ForwardDiff away. Not sure what to expect on that front.

1 Like

We need to expand this test and include at least ~10 more micro benchmarks.
Few ideas:

  1. Newton iterations to find a root.
  2. Projecting an array to the unit simplex.
  3. Creating the Dynamic Time Warping Table between 2 vectors (We could also add few other Dynamic Programming cases).

Probably people with CS background could few more (Searching in a tree? Implementing some data structure?).

We donā€™t need more benchmark ideas. People have many of those. We need more benchmarks which are implemented in every single language weā€™re testing in since otherwise it wonā€™t be shown. This latter part, getting someone to implement it in language x != Julia is the hard part where most ideas stay.

2 Likes

I would start with putting a Julia reference.
Then others will fill other languages.

I will be able to submit MATLAB / Python (Numba as well) and if the benchmark is simple I will be able to handle C.
I guess finding people for LuaJit and R will be simple as well.

Those benchmark arenā€™t only for advertising, they are reference point for those who are responsible keeping Julia fast.

But each starts with some Julia reference code and tests which makes sense though are simple and compact.

We use https://github.com/JuliaCI/BaseBenchmarks.jl/tree/master/src for that. The plot is just for marketing purposes.

First, this is great to see there is a broad reference point, didnā€™t expect less.
But those are only implemented in Julia.
They donā€™t give you a reference versus other competitors.

Itā€™s like being a professional sprinter.
You both check your performance against yourself (In house training) and against others (Competitions).

Hello.

Where can we see the list of packages compatible with Julia v1.0?

When I check any package I can only see if works with v0.6 and v0.7 but nothing about v1.0.

And where is the ā€œJulia Language Documentationā€ pdf for Julia v1.0?

First of all, your question do not belong into this thread.
Nevertheless two answers:

  1. there is currently no list of packages compatible with 1.0
  2. there is currently no documentation available in pdf format.
    Please search for other threads where these issues are discussed.
    Uwe

Hereā€™s a pdf version http://mortenpi.eu/julia/julia-1.0.0-manual.pdf

9 Likes

Nice! Did not expect this so fast. :slight_smile:

Choosing to benchmark operations on small matrices was intentional since itā€™s actually hard for the language layer as compared to large matmul which just measures BLAS speed, which is pretty uninteresting from a language perspective. Thereā€™s also a large matmul benchmark for those who really insist on seeing that. Of course, the only thing interesting that it shows is that some languages manage to make matmul slow even though they are calling BLAS.

1 Like

This is optimistic :smile:. But I guess it might happenā€¦

Itā€™s very useful because weā€™re interested in the changes to our performance over time which is the primary purpose of these benchmarks. Comparing to a bunch of other languages is mostly only useful, as @kristoffer.carlsson says, for marketing. For ourselves we donā€™t need to compare against all of these other languages languages, we just need to make sure weā€™re as fast as we can be, which is generally well estimated by comparison to to C/Fortran speed.

7 Likes

Just out of curiosity: what is the worst case performance of Julia 1.0 compared to C? Suppose we were in a masochistic mood and decided to design a new microbenchmark specifically to make Julia look as bad as possible compared to C. What would that benchmark test, and how bad do you think that relative performance would be?

Technically speakingā€¦
What Juliaā€™s areas have more room for improvement in the near future?
What Juliaā€™s areas are the subject of more research and involves more people?

Code where type inference fails.

but apart from that,
Whatā€™s the roadmap?

Better multithreading model.