Understanding performance of vectors

D-Se · December 11, 2023, 4:07pm

I have created a toy package, TicTacToes.jl to try and write and package some performant Julia code.

When trying to write an example on how to use make a (bad) agent play itself, I didn’t understand the performance of one approach:

using TicTacToes
function self_play()
    g = TicTacToe()
    while g.state == RUN ai!(g) end
    g.state
end

n = 100_000_000
@time res = [self_play() for i in 1:n]     # +- 23 sec on my PC

struct GameVec <: AbstractVector{Int} len::Int end
Base.size(g::GameVec) = (g.len,)
Base.getindex(::GameVec, i::Int) = self_play()
@time res2 = GameVec(n)                   # near instant (??)

Question: Why is GameVec so much faster than a simple comprehension? Where can I learn more on what is happening under the hood?

Where I’m stuck: calling code_native and friends on GameVec just shows it returns its argument. I can’t find where / how the logic in self_play is run.

SteffenPL · December 11, 2023, 4:26pm

It looks like you are comparing two very different things, e.g. the first timing is for numerous calls of self_play() and the second timing merely constructs an instance of type GameVec, which does not call self_play() at all.

I assume that the performance here is due to the runtime of self_play itself, and calling it 100 million times is also quite a bit, even for a very fast function.

For general performance tips, consider the links in Optimizing your code (modernjuliaworkflows.github.io), in particular the performance tips section and maybe profiling in VS code.

D-Se · December 11, 2023, 4:54pm

Definitely comparing two different things here. I am quite confused what thing GameVec creates here, by the @time it takes, it doesn’t do much. My confusion comes from the results as I print res2.

res
#> 10000000-element Vector{GameState}:
#>  X::GameState = 1
#>  DRAW::GameState = 3
#>  O::GameState = 2

Yet, when it res2 prints, it does print a game state (an integer from 1 to 3). The state field of the TicTacToe is initially always 0, so something happens, but I don’t know how to find out what happens.

res2
#> 10000000-element GameVec:
#>  X::GameState = 1 # these may change when `res2` is printed
#>  O::GameState = 2 
#>  X::GameState = 1

res2 # recalling the same thing, directly afterward
#> 10000000-element GameVec:
#> DRAW::GameState = 3
#> X::GameState = 1
#> X::GameState = 1

However, what res2 prints, changes on different prints. Each access, i.e. res2[1] comes with the same computational cost as 1 run of self_play() + some extra.

Then the question, I suppose, is what those results in the GameVec are, something similar to undef for other vectors?

Benny · December 11, 2023, 6:08pm

GameVec(n) is more analogous to the 1:n part, so a fair comparison is game1 = GameVec(n); @time res2 = [game1[i] for i in eachindex(game1)]. game1[i] calls getindex; eachindex calls size.

D-Se · December 11, 2023, 7:24pm

Thanks - that makes sense, and answers the performance part of the question.

I’m still a bit confused how/why SomeVec(5) in this codeblock prints 40x “I’m running!” (i.e. something is still ran), but at least I know to not see these as values as equal to the ones in the regular loop/comprehension.

using Random
mutable struct A a::Float64 end
step!(x::A) = x.a += rand()

function run()
    println("I'm running!")
    t = A(1.0)
    while t.a < 2.0 step!(t) end
    t.a
end

Random.seed!(1)
@time res = [run() for i in 1:5]

struct SomeVec <: AbstractVector{Float64} len::Int end
Base.size(s::SomeVec) = (s.len,)
Base.getindex(::SomeVec, i::Int) = run()

init = SomeVec(5) # run() prints stuff 40x
Random.seed!(1)
@time res2 = [init[i] for i in eachindex(init)]

res == res2  # true

greg_plowman · December 11, 2023, 9:03pm

Because getindex is used in displaying your SomeVec object in the REPL.
You have defined SomeVec as a subtype of AbstractVector, so it “inherits” lots of functionality including how it is displayed.

You can use a semi-colon ; to suppress printing.

Or you can provide explicit show methods for your type:

Base.show(io::IO, s::SomeVec) = print(io, typeof(s), "(", s.len, ")")
Base.show(io::IO, ::MIME"text/plain", s::SomeVec) = print(io, typeof(s), "(", s.len, ")")

See manual pages:
I/O and Network · The Julia Language
I/O and Network · The Julia Language

Benny · December 11, 2023, 11:10pm

More exact reason is that Base.alignment iterates the rows of a vector or matrix to find the space needed to print all the elements in an aligned column, then Base.print_matrix_row iterates again to print. Each iteration indexes a few times, the exact number varying by version, so it’s not good to print while indexing. I’d separate the printing step from the element computation, and write a method (function (s::SomeVec)(i::Int)...) for SomeVec instances so a just-as-concise call init(i) prints while indexing init[i] doesn’t.

Topic		Replies	Views
Loop vs vectorization Performance	4	1478	October 10, 2023
Comprehensions and vectorization New to Julia	13	4929	September 20, 2017
Performance regression with StaticArrays? Performance question , staticarrays	5	469	January 27, 2023
Summing matrix elements is >1000X slower than summing vector elements General Usage performance	8	1330	April 17, 2017
Loops vs Vectorization? When to use? New to Julia performance	18	4542	February 17, 2023

Understanding performance of vectors

Related topics