Understanding performance of vectors

I have created a toy package, TicTacToes.jl to try and write and package some performant Julia code.

When trying to write an example on how to use make a (bad) agent play itself, I didn’t understand the performance of one approach:

using TicTacToes
function self_play()
    g = TicTacToe()
    while g.state == RUN ai!(g) end
    g.state
end

n = 100_000_000
@time res = [self_play() for i in 1:n]     # +- 23 sec on my PC

struct GameVec <: AbstractVector{Int} len::Int end
Base.size(g::GameVec) = (g.len,)
Base.getindex(::GameVec, i::Int) = self_play()
@time res2 = GameVec(n)                   # near instant (??)

Question: Why is GameVec so much faster than a simple comprehension? Where can I learn more on what is happening under the hood?

Where I’m stuck: calling code_native and friends on GameVec just shows it returns its argument. I can’t find where / how the logic in self_play is run.

It looks like you are comparing two very different things, e.g. the first timing is for numerous calls of self_play() and the second timing merely constructs an instance of type GameVec, which does not call self_play() at all.

I assume that the performance here is due to the runtime of self_play itself, and calling it 100 million times is also quite a bit, even for a very fast function.

For general performance tips, consider the links in Optimizing your code (modernjuliaworkflows.github.io), in particular the performance tips section and maybe profiling in VS code.

Definitely comparing two different things here. I am quite confused what thing GameVec creates here, by the @time it takes, it doesn’t do much. My confusion comes from the results as I print res2.

res
#> 10000000-element Vector{GameState}:
#>  X::GameState = 1
#>  DRAW::GameState = 3
#>  O::GameState = 2

Yet, when it res2 prints, it does print a game state (an integer from 1 to 3). The state field of the TicTacToe is initially always 0, so something happens, but I don’t know how to find out what happens.

res2
#> 10000000-element GameVec:
#>  X::GameState = 1 # these may change when `res2` is printed
#>  O::GameState = 2 
#>  X::GameState = 1

res2 # recalling the same thing, directly afterward
#> 10000000-element GameVec:
#> DRAW::GameState = 3
#> X::GameState = 1
#> X::GameState = 1

However, what res2 prints, changes on different prints. Each access, i.e. res2[1] comes with the same computational cost as 1 run of self_play() + some extra.

Then the question, I suppose, is what those results in the GameVec are, something similar to undef for other vectors?

1 Like

GameVec(n) is more analogous to the 1:n part, so a fair comparison is game1 = GameVec(n); @time res2 = [game1[i] for i in eachindex(game1)]. game1[i] calls getindex; eachindex calls size.

1 Like

Thanks - that makes sense, and answers the performance part of the question.

I’m still a bit confused how/why SomeVec(5) in this codeblock prints 40x “I’m running!” (i.e. something is still ran), but at least I know to not see these as values as equal to the ones in the regular loop/comprehension.

using Random
mutable struct A a::Float64 end
step!(x::A) = x.a += rand()

function run()
    println("I'm running!")
    t = A(1.0)
    while t.a < 2.0 step!(t) end
    t.a
end

Random.seed!(1)
@time res = [run() for i in 1:5]

struct SomeVec <: AbstractVector{Float64} len::Int end
Base.size(s::SomeVec) = (s.len,)
Base.getindex(::SomeVec, i::Int) = run()

init = SomeVec(5) # run() prints stuff 40x
Random.seed!(1)
@time res2 = [init[i] for i in eachindex(init)]

res == res2  # true

Because getindex is used in displaying your SomeVec object in the REPL.
You have defined SomeVec as a subtype of AbstractVector, so it “inherits” lots of functionality including how it is displayed.

You can use a semi-colon ; to suppress printing.

Or you can provide explicit show methods for your type:

Base.show(io::IO, s::SomeVec) = print(io, typeof(s), "(", s.len, ")")
Base.show(io::IO, ::MIME"text/plain", s::SomeVec) = print(io, typeof(s), "(", s.len, ")")

See manual pages:
I/O and Network · The Julia Language
I/O and Network · The Julia Language

2 Likes

More exact reason is that Base.alignment iterates the rows of a vector or matrix to find the space needed to print all the elements in an aligned column, then Base.print_matrix_row iterates again to print. Each iteration indexes a few times, the exact number varying by version, so it’s not good to print while indexing. I’d separate the printing step from the element computation, and write a method (function (s::SomeVec)(i::Int)...) for SomeVec instances so a just-as-concise call init(i) prints while indexing init[i] doesn’t.