Thanks.
I realize now, there is actually a ton of literature and SO questions on those topics.
http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx
Thanks.
I realize now, there is actually a ton of literature and SO questions on those topics.
http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx
Actually, the base library does this in two steps: generate a random integer i
in 1:length(231:33:423)
, and then indexing this StepRange
at i
, i.e. (231:33:423)[i]
. So your approach generalizes very easily too. But as @bkamins nicely explained, this is not as perfectly distributed as the base method.
This is a consequence of how floating-point numbers work, not of how rand
is implemented. A Float64
is a sign bit, 11 exponent bits, and a 52-bit significand (plus an implicit 1 at the beginning). For normalized values in [0,1), the sign bit and the exponent bits are fixed. So, it is only the significand bits that are random.
What was surprising to me is that even rand(UInt64)
will get 52 random bits twice, then ⊻
them. I thought it would get 64 random bits directly.
Yes. But in [0,1] they are not distributed uniformly. They are distributed uniformly when you consider Close1Open2
interval.
You can see it by running:
[mean(bits(rand())[60]-'0' for i in 1:10^6) for i in 13:64]
and
[mean(bits(rand(Base.Random.Close1Open2))[60]-'0' for i in 1:10^6) for i in 13:64]
It’s because the dSFMT library produces natively 52 random bits at a time. Some time ago I had experimented using SFMT instead, which produces directly 64 bits (or 128), but the benefits were not substantial compared to base’s approach (of course I may have had my benchmark wrong).
Actually, nothing stops you in Julia from achieving 1X
C speed. You just need to be more careful and limit the amount of dynamism in your code. As others pointed out, SVectors
are your best shoot here. In the following code, I used SVectors
and only modified two lines; line 73 now reads trunc(Int, 1+rand()*game.nbFreeMoves)
and line 80 contains a ternary condition which evaluates in both cases to the same function, I removed the condition. I also allowed inlining of most functions via the @inline
macro.
The following code executes at the same speed as the C++ version on my machine.
const NB_GAMES = 1000
using StaticArrays
const EYE9 = eye(SMatrix{9, 9, Int})
const BOARD_LUT = SVector(EYE9, 2*EYE9)
struct Tictactoe
board::SVector{9,Int}
nbFreeMoves :: Int
currentPlayer :: Int
nextPlayer :: Int
end
Tictactoe() = Tictactoe( [0, 0, 0, 0, 0, 0, 0, 0, 0] , 9, 1, 2)
@inline function computeFinish(b::SVector{9,Int}, nbFreeMoves::Int)
if b[1] != 0 && b[1] == b[2] == b[3]
true, b[1]
elseif b[4] != 0 && b[4] == b[5] == b[6]
true, b[4]
elseif b[7] != 0 && b[7] == b[8] == b[9]
true, b[7]
elseif b[1] != 0 && b[1] == b[4] == b[7]
true, b[1]
elseif b[2] != 0 && b[2] == b[5] == b[8]
true, b[2]
elseif b[3] != 0 && b[3] == b[6] == b[9]
true, b[3]
elseif b[1] != 0 && b[1] == b[5] == b[9]
true, b[1]
elseif b[3] != 0 && b[3] == b[5] == b[7]
true, b[1]
elseif nbFreeMoves == 0 # no winner, no empty cell -> draw
true, 0
else # no winner, empty cell -> game not finished
false, 0
end
end
@inline function findIndex(board::SVector{9,Int}, moveIndex::Int)
nbFree = 0
k = 1
while k<=9
if board[k] == 0
nbFree += 1
if nbFree == moveIndex
break
end
end
k += 1
end
k
end
@inline function playIndex(game::Tictactoe, moveIndex::Int)
k = findIndex(game.board, moveIndex)
if k <= 9
board = game.board + BOARD_LUT[game.currentPlayer][k,:]
Tictactoe(board, game.nbFreeMoves-1, game.nextPlayer, game.currentPlayer)
else
game
end
end
abstract type Bot end
@inline function runGame(game::Tictactoe, bot1::Bot, bot2::Bot)
while true
finished, winner = computeFinish(game.board, game.nbFreeMoves)
if finished
return winner
end
moveIndex = genmove(game, bot1) # game.currentPlayer == 1, genmove(game, bot1), genmove(game, bot2)
game = playIndex(game, moveIndex)
end
end
struct RandomBot <: Bot end
@inline genmove(game::Tictactoe, bot::RandomBot) = trunc(Int, 1+rand()*game.nbFreeMoves) # ceil(Int,rand()*game.nbFreeMoves)
struct McBot <: Bot
maxSims::Int
end
function evalMove(game::Tictactoe, moveIndex::Int, nbSims::Int)
bot = RandomBot()
player = game.currentPlayer
game1 = playIndex(game, moveIndex)
evalMove1(unused) = runGame(game1, bot, bot) == player ? 1 : 0
mapreduce(evalMove1, (+), 0, 1:nbSims)
end
function genmove(game::Tictactoe, bot::McBot)
nbMoves = game.nbFreeMoves
nbMoveSims = max(1, div(bot.maxSims, nbMoves))
simulate(moveIndex) = evalMove(game, moveIndex, nbMoveSims)
wins = map(simulate, [moveIndex for moveIndex in 1:nbMoves])
indmax(wins)
end
@inline function runGames(game::Tictactoe, bot1::Bot, bot2::Bot, nbGames::Int)
res = [0, 0, 0]
for _ in 1:nbGames
winner = 1 + runGame(game, bot1, bot2)
res[winner] += 1
end
res
end
function runXp(nbSims::Int)
game = Tictactoe()
botA = RandomBot()
botB = McBot(nbSims)
resAB = runGames(game, botA, botB, NB_GAMES)
resBA = runGames(game, botB, botA, NB_GAMES)
println("RandomBot McBot ", nbSims, " ", NB_GAMES, " ",
resAB[1], " ", resAB[2], " ", resAB[3], " ",
resBA[1], " ", resBA[2], " ", resBA[3], " ",
resAB[1]+resBA[1], " ", resAB[2]+resBA[3], " ", resAB[3]+resBA[2])
end
println("botA botB nbSims nbGames drawsA1B2 winsA1 winsB2 drawsB1A2 winsB1 winsA2 draws winsA winsB")
vecNbSims = [1, 10, 100, 1000, 10000]
@time map(runXp, vecNbSims)
Line 80, the condition is needed because runGame
is also used in runGames
(with different bots). Inlining works great: the SVector version is almost as fast as the C++ version. Thanks.
Is there a way to predict which function will be interesting to inline ?
Oh, yes I missed this.
Well, you can still beat C++ using only SVector
without struct
. The following version is 5%
faster than C++. Enjoy.
const NB_boardS = 1000
using StaticArrays
const EYE9 = eye(SMatrix{9, 9, Int})
const BOARD_LUT = SVector(EYE9, 2*EYE9)
const vec9 = SVector{9,Int}
const vec3 = SVector{3,Int}
@inline function computeFinish(b::SVector{9,Int}, nbFreeMoves::Int)
if b[1] != 0 && b[1] == b[2] == b[3]
true, b[1]
elseif b[4] != 0 && b[4] == b[5] == b[6]
true, b[4]
elseif b[7] != 0 && b[7] == b[8] == b[9]
true, b[7]
elseif b[1] != 0 && b[1] == b[4] == b[7]
true, b[1]
elseif b[2] != 0 && b[2] == b[5] == b[8]
true, b[2]
elseif b[3] != 0 && b[3] == b[6] == b[9]
true, b[3]
elseif b[1] != 0 && b[1] == b[5] == b[9]
true, b[1]
elseif b[3] != 0 && b[3] == b[5] == b[7]
true, b[1]
elseif nbFreeMoves == 0 # no winner, no empty cell -> draw
true, 0
else # no winner, empty cell -> game not finished
false, 0
end
end
@inline function findIndex(board::SVector{9,Int}, moveIndex::Int)
nbFree = 0
k = 1
while k<=9
if board[k] == 0
nbFree += 1
if nbFree == moveIndex
break
end
end
k += 1
end
k
end
@inline function playIndex(board, last3, moveIndex::Int)
k = findIndex(board, moveIndex)
if k <= 9
board = board + BOARD_LUT[last3[2]][k,:]
board, vec3(last3[1]-1,last3[3],last3[2])
else
board, last3
end
end
abstract type Bot end
@inline function runGame(board, last3, bot1::Bot, bot2::Bot)
while true
finished, winner = computeFinish(board, last3[1])
if finished
return winner
end
moveIndex = last3[2] == 1 ? genmove(board, last3, bot1) : genmove(board, last3, bot2)
board, last3 = playIndex(board, last3, moveIndex)
end
end
struct RandomBot <: Bot end
@inline genmove(board, last3, bot::RandomBot) = trunc(Int, 1+rand()*last3[1])
struct McBot <: Bot
maxSims::Int
end
function evalMove(board, last3, moveIndex::Int, nbSims::Int)
bot = RandomBot()
player = last3[2]
board1,vvv3 = playIndex(board, last3, moveIndex)
evalMove1(unused) = runGame(board1, vvv3, bot, bot) == player ? 1 : 0
mapreduce(evalMove1, (+), 0, 1:nbSims)
end
function genmove(board, last3, bot::McBot)
nbMoves = last3[1]
nbMoveSims = max(1, div(bot.maxSims, nbMoves))
simulate(moveIndex) = evalMove(board, last3, moveIndex, nbMoveSims)
wins = map(simulate, [moveIndex for moveIndex in 1:nbMoves]) #
indmax(wins)
end
function runGames(board, last3, bot1::Bot, bot2::Bot, nbboards::Int)
res = [0, 0, 0]
for _ in 1:nbboards
winner = 1 + runGame(board, last3, bot1, bot2) #
res[winner] += 1
end
res
end
function runXp(nbSims::Int)
board = vec9(0,0,0,0,0,0,0,0,0)
last3 = vec3(9,1,2)
botA = RandomBot()
botB = McBot(nbSims)
resAB = runGames(board, last3, botA, botB, NB_boardS)
resBA = runGames(board, last3, botB, botA, NB_boardS)
println("RandomBot McBot ", nbSims, " ", NB_boardS, " ",
resAB[1], " ", resAB[2], " ", resAB[3], " ",
resBA[1], " ", resBA[2], " ", resBA[3], " ",
resAB[1]+resBA[1], " ", resAB[2]+resBA[3], " ", resAB[3]+resBA[2])
end
println("botA botB nbSims nbboards drawsA1B2 winsA1 winsB2 drawsB1A2 winsB1 winsA2 draws winsA winsB")
vecNbSims = [1, 10, 100, 1000, 10000]
@time map(runXp, vecNbSims)
Awesome work!
I remember in the benchmark thread, many said there’s more variability between compilers than between Julia and C/Fortran/&Co.
So, testing your code (following compiling via runXp(1)
) on my machine:
julia> versioninfo()
Julia Version 0.6.1
Commit 0d7248e* (2017-10-24 22:15 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: AMD Ryzen Threadripper 1950X 16-Core Processor
WORD_SIZE: 64
BLAS: libopenblas (ZEN)
LAPACK: liblapack
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, generic)
julia> @time map(runXp, vecNbSims);
RandomBot McBot 1 1000 25 301 674 33 861 106 58 407 1535
RandomBot McBot 10 1000 22 265 713 24 934 42 46 307 1647
RandomBot McBot 100 1000 7 115 878 7 982 11 14 126 1860
RandomBot McBot 1000 1000 3 66 931 0 1000 0 3 66 1931
RandomBot McBot 10000 1000 3 55 942 0 1000 0 3 55 1942
7.797082 seconds (100.84 k allocations: 12.569 MiB, 0.11% gc time)
Testing clang++-4.0
:
$ clang++-4.0 -o no_lam_clang tictactoe_cmp.cpp -march=native -Ofast -std=c++11
$ time ./no_lam_clang
botA botB nbSims nbGames drawsA1B2 winsA1 winsB2 drawsB1A2 winsB1 winsA2 draws winsA winsB
RandomBot McBot 1 1000 84 441 475 9 922 69 93 510 1397
RandomBot McBot 10 1000 90 399 511 9 938 53 99 452 1449
RandomBot McBot 100 1000 49 217 734 5 982 13 54 230 1716
RandomBot McBot 1000 1000 28 163 809 3 990 7 31 170 1799
RandomBot McBot 10000 1000 22 153 825 4 988 8 26 161 1813
real 0m27.158s
user 0m27.155s
sys 0m0.000s
Makes me wonder if I did something wrong with clang???
If not: that is quite the trouncing by Julia as an llvm frontend!
$ g++ -o no_lam_5_4 tictactoe_cmp.cpp -march=native -Ofast -std=c++11
$ time ./no_lam_5_4
botA botB nbSims nbGames drawsA1B2 winsA1 winsB2 drawsB1A2 winsB1 winsA2 draws winsA winsB
RandomBot McBot 1 1000 111 413 476 1 917 82 112 495 1393
RandomBot McBot 10 1000 87 416 497 6 948 46 93 462 1445
RandomBot McBot 100 1000 53 247 700 4 986 10 57 257 1686
RandomBot McBot 1000 1000 28 154 818 6 991 3 34 157 1809
RandomBot McBot 10000 1000 35 132 833 4 991 5 39 137 1824
real 0m8.165s
user 0m8.159s
sys 0m0.004s
$ g++-7.2 -o no_lam tictactoe_cmp.cpp -march=native -Ofast
$ time ./no_lam
botA botB nbSims nbGames drawsA1B2 winsA1 winsB2 drawsB1A2 winsB1 winsA2 draws winsA winsB
RandomBot McBot 1 1000 85 419 496 6 927 67 91 486 1423
RandomBot McBot 10 1000 81 432 487 8 935 57 89 489 1422
RandomBot McBot 100 1000 58 243 699 7 990 3 65 246 1689
RandomBot McBot 1000 1000 42 148 810 6 987 7 48 155 1797
RandomBot McBot 10000 1000 29 129 842 8 984 8 37 137 1826
real 0m8.028s
user 0m8.026s
sys 0m0.000s
gcc (versions 5.4 and 7.2) did much better, but still not as fast as Julia.
Did I do something wrong with clang – taking about 3.5 times longer than Julia?
Does that imply anything about the future of Julia with newer versions of llvm?
@Seif_Shebl: yes, but to be fair, we should implement this in the C++ code too.
@Elrod: same problem on my machine (linux, gcc 7.2 = 12s, clang 4.0.1 = 36s). I don’t know why…