I have created a toy package, TicTacToes.jl to try and write and package some performant Julia code.
When trying to write an example on how to use make a (bad) agent play itself, I didn’t understand the performance of one approach:
using TicTacToes
function self_play()
g = TicTacToe()
while g.state == RUN ai!(g) end
g.state
end
n = 100_000_000
@time res = [self_play() for i in 1:n] # +- 23 sec on my PC
struct GameVec <: AbstractVector{Int} len::Int end
Base.size(g::GameVec) = (g.len,)
Base.getindex(::GameVec, i::Int) = self_play()
@time res2 = GameVec(n) # near instant (??)
Question: Why is GameVec so much faster than a simple comprehension? Where can I learn more on what is happening under the hood?
Where I’m stuck: calling code_native and friends on GameVec just shows it returns its argument. I can’t find where / how the logic in self_play is run.
It looks like you are comparing two very different things, e.g. the first timing is for numerous calls of self_play() and the second timing merely constructs an instance of type GameVec, which does not call self_play() at all.
I assume that the performance here is due to the runtime of self_play itself, and calling it 100 million times is also quite a bit, even for a very fast function.
Definitely comparing two different things here. I am quite confused what thing GameVec creates here, by the @time it takes, it doesn’t do much. My confusion comes from the results as I print res2.
Yet, when it res2 prints, it does print a game state (an integer from 1 to 3). The state field of the TicTacToe is initially always 0, so something happens, but I don’t know how to find out what happens.
res2
#> 10000000-element GameVec:
#> X::GameState = 1 # these may change when `res2` is printed
#> O::GameState = 2
#> X::GameState = 1
res2 # recalling the same thing, directly afterward
#> 10000000-element GameVec:
#> DRAW::GameState = 3
#> X::GameState = 1
#> X::GameState = 1
However, what res2 prints, changes on different prints. Each access, i.e. res2[1] comes with the same computational cost as 1 run of self_play() + some extra.
Then the question, I suppose, is what those results in the GameVec are, something similar to undef for other vectors?
GameVec(n) is more analogous to the 1:n part, so a fair comparison is game1 = GameVec(n); @time res2 = [game1[i] for i in eachindex(game1)]. game1[i] calls getindex; eachindex calls size.
Thanks - that makes sense, and answers the performance part of the question.
I’m still a bit confused how/why SomeVec(5) in this codeblock prints 40x “I’m running!” (i.e. something is still ran), but at least I know to not see these as values as equal to the ones in the regular loop/comprehension.
using Random
mutable struct A a::Float64 end
step!(x::A) = x.a += rand()
function run()
println("I'm running!")
t = A(1.0)
while t.a < 2.0 step!(t) end
t.a
end
Random.seed!(1)
@time res = [run() for i in 1:5]
struct SomeVec <: AbstractVector{Float64} len::Int end
Base.size(s::SomeVec) = (s.len,)
Base.getindex(::SomeVec, i::Int) = run()
init = SomeVec(5) # run() prints stuff 40x
Random.seed!(1)
@time res2 = [init[i] for i in eachindex(init)]
res == res2 # true
Because getindex is used in displaying your SomeVec object in the REPL.
You have defined SomeVec as a subtype of AbstractVector, so it “inherits” lots of functionality including how it is displayed.
You can use a semi-colon ; to suppress printing.
Or you can provide explicit show methods for your type:
More exact reason is that Base.alignment iterates the rows of a vector or matrix to find the space needed to print all the elements in an aligned column, then Base.print_matrix_row iterates again to print. Each iteration indexes a few times, the exact number varying by version, so it’s not good to print while indexing. I’d separate the printing step from the element computation, and write a method (function (s::SomeVec)(i::Int)...) for SomeVec instances so a just-as-concise call init(i) prints while indexing init[i] doesn’t.