Utilizing @turbo / @tturbo in Performance Critical Code

nosewitz · September 8, 2022, 7:22pm

Incredible! The speed is about 20-24 seconds now. So as I understand it, when we use @turbo we have to write granular code and? Also by transposing r1 and r2 do you mean keeping them in their original form?

This is how the test matrix looks:

julia> X_test
20000×6 Matrix{Float64}:
 3295.0  4044.0  1072.0   27.0  103.0   42.0
 2959.0  1825.0  2980.0   89.0   42.0  323.0
 2934.0  2779.0  1236.0  176.0  349.0  594.0
 2808.0  1423.0  1734.0   80.0  147.0  180.0
 3164.0  1338.0  2271.0   32.0   57.0  350.0
 3085.0  2598.0  1712.0   90.0  299.0  324.0
 3358.0  1332.0  1655.0   32.0   98.0  175.0
    ⋮                                    ⋮
 2742.0  2493.0  1753.0   64.0   26.0  446.0
 2735.0  1578.0  1879.0   19.0   65.0  457.0
 3091.0  2239.0  2431.0   25.0  323.0  258.0
 3278.0  1931.0  2439.0   63.0  115.0  470.0
 3178.0  4577.0  2154.0   42.0   93.0  551.0
 3262.0   390.0   807.0   28.0  261.0  408.0

So then I rotate 90 degrees to the left to transpose the rows, which then become columns.

julia> X_test |> rotl90
6×20000 Matrix{Float64}:
   42.0   323.0   594.0   180.0   350.0   324.0   175.0   150.0  …   218.0   324.0   446.0   457.0   258.0   470.0   551.0   408.0      
  103.0    42.0   349.0   147.0    57.0   299.0    98.0   316.0      193.0   354.0    26.0    65.0   323.0   115.0    93.0   261.0
   27.0    89.0   176.0    80.0    32.0    90.0    32.0    39.0       37.0    43.0    64.0    19.0    25.0    63.0    42.0    28.0      
 1072.0  2980.0  1236.0  1734.0  2271.0  1712.0  1655.0   966.0     1410.0  1976.0  1753.0  1879.0  2431.0  2439.0  2154.0   807.0      
 4044.0  1825.0  2779.0  1423.0  1338.0  2598.0  1332.0  5486.0     3208.0  4094.0  2493.0  1578.0  2239.0  1931.0  4577.0   390.0
 3295.0  2959.0  2934.0  2808.0  3164.0  3085.0  3358.0  3307.0  …  3296.0  3112.0  2742.0  2735.0  3091.0  3278.0  3178.0  3262.0

Is this the wrong approach? I thought that Julia performs better when we iterate through columns rather than rows. Thus by rotating the rows we can access them faster now that they are columns.

I’m learning a lot from this : )

Topic		Replies	Views
Efficient use of @turbo for linear algebra operations (LoopVectorization.jl) Performance linearalgebra , loopvectorization	6	4136	August 21, 2021
@turbo on sets of operations Performance	15	1099	August 10, 2021
@turbo speeds routine, slows down everything else Performance loopvectorization	16	2688	June 5, 2021
@turbo macro gives incorrect results Performance loopvectorization	4	487	October 26, 2022
@turbo macro giving slightly different results General Usage loopvectorization	6	564	January 28, 2023

Utilizing @turbo / @tturbo in Performance Critical Code

Related topics