Julia's assignment behavior differs from Fortran?

Elrod · November 18, 2020, 10:57pm

FWIW, Fortran compilers normally don’t have a problem putting all the arrays you’re mutating on the stack. For example, see the gfortran flag -fstack-arrays.
In Julia, StaticArrays.MArrays will also generally be placed on the stack, so long as they do not escape the function creating them or get passed as an argument to a function that isn’t inlined.

A feature request I have in Julia (maybe I’ll make a PR) is support for alloca, so it’s easier to stack allocate memory that’s freed when the function returns, but that you can pass as an argument to non-inlined functions. llvmcall doesn’t work for this, because the memory is immediately freed when the llvmcall returns, not when the Julia function calling llvmcall returns.

julia> using StaticArrays, BenchmarkTools

julia> function f()
         s = 0.
         for i in 1:1000
           x = SVector{3,Float64}(i, sqrt(i), i^2)
           for j in 1:3
             s = s + x[j]
           end
         end
         s
       end
f (generic function with 1 method)

julia> function g()
         s = 0.
         for i in 1:1000
           x = MVector{3,Float64}(undef)
           x[1] = i
           x[2] = sqrt(i)
           x[3] = i^2
           for j in 1:3
             s = s + x[j]
           end
         end
         s
       end
g (generic function with 1 method)

julia> @btime f()
  2.605 μs (0 allocations: 0 bytes)
3.343550974558874e8

julia> @btime g()
  2.605 μs (0 allocations: 0 bytes)
3.343550974558874e8

However, SVectors are easier for the compiler; using the MVector can prevent some optimizations; see what happens when we add @fastmath:

julia> @fastmath function f()
         s = 0.
         for i in 1:1000
           x = SVector{3,Float64}(i, sqrt(i), i^2)
           for j in 1:3
             s = s + x[j]
           end
         end
         s
       end
f (generic function with 1 method)

julia> @fastmath function g()
         s = 0.
         for i in 1:1000
           x = MVector{3,Float64}(undef)
           x[1] = i
           x[2] = sqrt(i)
           x[3] = i^2
           for j in 1:3
             s = s + x[j]
           end
         end
         s
       end
g (generic function with 1 method)

julia> @btime f()
  744.343 ns (0 allocations: 0 bytes)
3.3435509745588744e8

julia> @btime g()
  2.604 μs (0 allocations: 0 bytes)
3.343550974558874e8

Suddenly f (using SVectors) is much faster, but the speed of g() is almost unchanged.

But my point here is that, just because the MVector is mutable and getindex and setindex! are defined using loads and stores to/from their pointers doesn’t mean it has to be heap allocated.

EDIT:
The simplest way to speed up g is to use @inbounds:

julia> @fastmath function h()
         s = 0.
         @inbounds for i in 1:1000
           x = MVector{3,Float64}(undef)
           x[1] = i
           x[2] = sqrt(i)
           x[3] = i^2
           for j in 1:3
             s = s + x[j]
           end
         end
         s
       end
h (generic function with 1 method)

julia> @btime h()
  744.317 ns (0 allocations: 0 bytes)
3.3435509745588744e8

A less simple way is to manually unroll the inner loop:

julia> @fastmath function g()
         s = 0.
         for i in 1:1000
           x = MVector{3,Float64}(undef)
           x[1] = i
           x[2] = sqrt(i)
           x[3] = i^2
           Base.Cartesian.@nexprs 3 j -> begin
             s = s + x[j]
           end
         end
         s
       end
g (generic function with 1 method)

julia> @btime g()
  744.324 ns (0 allocations: 0 bytes)
3.3435509745588744e8

I guess it’s close to a limit where the compiler gives up optimizing it.

EDIT:
It just occurred to me that AVX512 is probably needed for the @fastmath versions of this function to be fast, because AVX512 is needed for SIMD Int64 -> Float64 conversion.
If you don’t have AVX512, I’d recommend

julia> @fastmath function h2()
         s = 0.; i = 0.0
         @inbounds for _ in 1:1000
           x = MVector{3,Float64}(undef);
           i += 1
           x[1] = i
           x[2] = sqrt(i)
           x[3] = i^2
           for j in 1:3
             s = s + x[j]
           end
         end
         s
       end
h2 (generic function with 1 method)

julia> @btime h2()
  711.417 ns (0 allocations: 0 bytes)
3.3435509745588744e8

Topic		Replies	Views
Julia's loop scope behavior differs from Fortran? General Usage	38	2745	November 18, 2020
Julia fn clearer than C++/Fortran (examples sought) General Usage	38	3217	May 30, 2021
Different output when call Fortran from Julia General Usage question , fortran , function , c	5	424	July 13, 2023
Equivalent Julia and Fortran syntax New to Julia fortran	9	1512	August 20, 2021
Variable which disappear in a for General Usage question , loops	6	756	November 17, 2020

Julia's assignment behavior differs from Fortran?

Related topics