The box problem

mthelm85 · October 1, 2024, 6:08pm

I just read this article:

and I am not arriving at the same conclusion. Here’s the quick and dirty code I came up with to test it:

    using StatsBase

	function row_wise()
		boxes = zeros(3,5)
		
		for _ in 1:2
			boxes[rand(1:3),rand(1:5)] = 1.0
		end
		
		counter = 0
		
		for i in 1:size(boxes, 1), j in 1:size(boxes, 2)

			counter += 1

			if boxes[i,j] == 1.0
				return counter
			end
		end
	end

	row_results = summarystats([row_wise() for _ in 1:5000])

	function col_wise()
		boxes = zeros(3,5)
		
		for _ in 1:2
			boxes[rand(1:3),rand(1:5)] = 1.0
		end
		
		counter = 0
		
		for j in 1:size(boxes, 2), i in 1:size(boxes, 1)

			counter += 1

			if boxes[i,j] == 1.0
				return counter
			end
		end
	end

	col_results = summarystats([col_wise() for _ in 1:5000])

I get the exact same distribution for both the row wise search and the column wise search. It takes 5.5 steps on average to find the first prize. What am I missing?

nilshg · October 1, 2024, 6:25pm

Not at a computer but from a quick read your code might end up having only one box (as you could get the same location from your rand calls)?

mthelm85 · October 1, 2024, 6:31pm

Good catch…here’s updated code:

	function generate_prizes()
	    prize_positions = Set()
		
	    while length(prize_positions) < 2
	        push!(prize_positions, (rand(1:3), rand(1:5)))
	    end
		
	    return prize_positions
	end
	
	function row_wise()
		prize_positions = generate_prizes()
		
		counter = 0
		
		for i in 1:3, j in 1:5
			counter += 1
			if (i,j) in prize_positions
				return counter
			end
		end
	end

	row_results = summarystats([row_wise() for _ in 1:10000])

	function col_wise()
		prize_positions = generate_prizes()
		
		counter = 0
		
		for j in 1:5, i in 1:3
			counter += 1
			if (i,j) in prize_positions
				return counter
			end
		end
	end

	col_results = summarystats([col_wise() for _ in 1:10000])

I still get the same distribution for both search methods, but now the average steps has decreased slightly (as expected) to 5.3ish.

bertschi · October 1, 2024, 6:34pm

Interesting problem …
Your program seems to just calculate the average number of steps each needs individually (and independently). Instead, the two search orders need to be compared on each (fixed) box in order to determine who would win head-to-head.
My code get’s the opposite result though:

function newbox()
    b = fill(false, 5, 3)
    # Drawing two indices in order to ensure that there are exactly two boxes (your code might end up with one box)
    b[rand(eachindex(b), 2)] .= true
    b
end

numsteps = Base.Fix1(findfirst, identity) ∘ vec

# Julia is column-major, so vec(b') is rowwise
compete(b) = (; andrew = numsteps(b'), barbara = numsteps(b))

julia> tournament = compete.(newbox() for _ in 1:100_000);

# Probability of Andrew winning
julia> sum(res -> res.andrew < res.barbara, tournament) / length(tournament)
0.37276

# Probability of tie
julia> sum(res -> res.andrew == res.barbara, tournament) / length(tournament)
0.21727

# Probability of Barbara winning
julia> sum(res -> res.andrew > res.barbara, tournament) / length(tournament)
0.40997

nilshg · October 1, 2024, 6:37pm

I should really just find a computer but doesn’t rand(eachindex, 2) still have the same problem? I would have thought you need sample from StatsBase without replacement

bertschi · October 1, 2024, 6:41pm

Yes it does, thanks. Fixing it via sample also has Andrew winning now!

mbauman · October 1, 2024, 6:43pm

I’m similarly seeing the slight edge to row-wise, encoding 0 as a row win and 1 as a column win and 0.5 as a tie. It’s fascinating to see how changing the dimensions changes these results.

julia> using Statistics, Random, StatsBase

julia> function firstwinner(boxes)
           bycol = CartesianIndices(boxes)
           byrow = permutedims(CartesianIndices(boxes))
           for i in 1:length(boxes)
               boxes[bycol[i]] && boxes[byrow[i]] && return 0.5 # tie
               boxes[bycol[i]] && return 1.0 # column-wise
               boxes[byrow[i]] && return 0.0 # row-wise
           end
       end
firstwinner (generic function with 1 method)

julia> function meanwinner(n)
           boxes = falses(3,5)
           mean(begin
               boxes .= false
               boxes[sample(begin:end, 2, replace=false)] .= true
               firstwinner(boxes)
           end for i in 1:n)
       end
meanwinner (generic function with 1 method)

julia> meanwinner(100000)
0.48151

julia> meanwinner(1000000)
0.480625

julia> meanwinner(10000000)
0.48096015

mthelm85 · October 1, 2024, 6:50pm

I tried to use sample to generate the prize positions and my results now align with those of @bertschi:

	function generate_prizes()
	    all_positions = [(i, j) for i in 1:3, j in 1:5] 
	    prize_positions = sample(all_positions, 2; replace=false)
	    return prize_positions
	end

	function competition()
	    prize_positions = generate_prizes()
	    andrew_count = 0
	    barbara_count = 0

	    found_prize = false
	    for i in 1:3
	        for j in 1:5
	            andrew_count += 1
	            if (i, j) in prize_positions
	                found_prize = true
	                break
	            end
	        end
	        if found_prize
	            break  
	        end
	    end
	
	    found_prize = false
	    for j in 1:5
	        for i in 1:3
	            barbara_count += 1
	            if (i, j) in prize_positions
	                found_prize = true
	                break
	            end
	        end
	        if found_prize
	            break 
	        end
	    end
	
	    if andrew_count == barbara_count
	        return "tie"
	    elseif andrew_count < barbara_count
	        return "andrew"
	    else
	        return "barbara"
	    end
	end

	results = [competition() for _ in 1:100_000]
	countmap(results)

Dict{String, Int64} with 3 entries:
  "barbara" => 37118
  "andrew"  => 41036
  "tie"     => 21846

That’s bizarre. I think I see it now, though. If you look at the picture in the article:

Wouldn’t it be true that, after the first box, every 3rd box that Barbara gets to is one that Andrew has already opened? And for Andrew, every 5th box is one that Barbara has already opened? So Andrew is looking at more unopened boxes and therefore has a higher probability of finding the prizes…??

I wonder what the analytical solution to this looks like…

mbauman · October 1, 2024, 7:50pm

That seems compelling, but it can’t be the whole story because if there’s only one prize there’s no edge anymore. And if the grid is 4×5 the edge goes to columnwise. It’s gotta be about the asymmetry of the linearization of the upper/lower triangular regions as the Guardian discusses.

Fun problem.

adienes · October 1, 2024, 8:07pm

well, the flippant answer is that you’re missing the fact that these distributions are not independent (despite the fact that the expected hitting time is the same)

this feels to me like a similar category of counterintuitive result as the intransitive dice

curtcorum · October 8, 2024, 10:43pm

I guess theory is still needed!

Topic		Replies	Views
[Nerdsnipe warning] Speed up short vector comparisons to beat R Performance	37	1710	April 30, 2024
Performance difference when accessing square matrix rows-first or cols-first Performance	14	1729	April 13, 2021
Draw two distinct random integers from `1:n` Performance statistics , performance , random	24	3206	December 3, 2021
Count occurrences of columns in a 2d array (using countmap) Performance question , performance , arrays	13	2087	December 3, 2021
The birthday paradox: help to be faster than Fortran Performance fortran , unrolling	16	1839	August 13, 2020

The box problem

Related topics