Just for the fun of it, and if you really wanted to strong-arm inference to unroll the iteration you can use Unrolled.jl. Unrolling is almost crucial here since you are calling a different method on almost every iteration (as Tim described).
You do need to re-engineer the function a bit (tuples are needed because their length is encoded in their type, unlike vectors):
IP=Iterators.product((a, b), (a, b), (a, b), (a, b)) |> collect
IPt=tuple(IP...)
@unroll function main_unroll(IP)
@unroll for i in IP
res = g(i...)
end
end
julia> @code_warntype main_unroll(IPt)
Variables
#self#::Core.Compiler.Const(main_unroll, false)
IP::Core.Compiler.Const(((A(), A(), A(), A()), (B(), A(), A(), A()), (A(), B(), A(), A()), (B(), B(), A(), A()), (A(), A(), B(), A()), (B(), A(), B(), A()), (A(), B(), B(), A()), (B(), B(), B(), A()), (A(), A(), A(), B()), (B(), A(), A(), B()), (A(), B(), A(), B()), (B(), B(), A(), B()), (A(), A(), B(), B()), (B(), A(), B(), B()), (A(), B(), B(), B()), (B(), B(), B(), B())), false)
res@_3::Int64
i@_4::NTuple{4,A}
res@_5::Int64
i@_6::Tuple{B,A,A,A}
res@_7::Int64
i@_8::Tuple{A,B,A,A}
res@_9::Int64
i@_10::Tuple{B,B,A,A}
res@_11::Int64
i@_12::Tuple{A,A,B,A}
res@_13::Int64
i@_14::Tuple{B,A,B,A}
res@_15::Int64
i@_16::Tuple{A,B,B,A}
res@_17::Int64
i@_18::Tuple{B,B,B,A}
res@_19::Int64
i@_20::Tuple{A,A,A,B}
res@_21::Int64
i@_22::Tuple{B,A,A,B}
res@_23::Int64
i@_24::Tuple{A,B,A,B}
res@_25::Int64
i@_26::Tuple{B,B,A,B}
res@_27::Int64
i@_28::Tuple{A,A,B,B}
res@_29::Int64
i@_30::Tuple{B,A,B,B}
res@_31::Int64
i@_32::Tuple{A,B,B,B}
res@_33::Int64
i@_34::NTuple{4,B}
Body::Nothing
1 ─ (i@_4 = Base.getindex(IP, 1))
│ (res@_3 = Core._apply_iterate(Base.iterate, Main.g, i@_4))
│ (i@_6 = Base.getindex(IP, 2))
│ (res@_5 = Core._apply_iterate(Base.iterate, Main.g, i@_6))
│ (i@_8 = Base.getindex(IP, 3))
│ (res@_7 = Core._apply_iterate(Base.iterate, Main.g, i@_8))
│ (i@_10 = Base.getindex(IP, 4))
│ (res@_9 = Core._apply_iterate(Base.iterate, Main.g, i@_10))
│ (i@_12 = Base.getindex(IP, 5))
│ (res@_11 = Core._apply_iterate(Base.iterate, Main.g, i@_12))
│ (i@_14 = Base.getindex(IP, 6))
│ (res@_13 = Core._apply_iterate(Base.iterate, Main.g, i@_14))
│ (i@_16 = Base.getindex(IP, 7))
│ (res@_15 = Core._apply_iterate(Base.iterate, Main.g, i@_16))
│ (i@_18 = Base.getindex(IP, 8))
│ (res@_17 = Core._apply_iterate(Base.iterate, Main.g, i@_18))
│ (i@_20 = Base.getindex(IP, 9))
│ (res@_19 = Core._apply_iterate(Base.iterate, Main.g, i@_20))
│ (i@_22 = Base.getindex(IP, 10))
│ (res@_21 = Core._apply_iterate(Base.iterate, Main.g, i@_22))
│ (i@_24 = Base.getindex(IP, 11))
│ (res@_23 = Core._apply_iterate(Base.iterate, Main.g, i@_24))
│ (i@_26 = Base.getindex(IP, 12))
│ (res@_25 = Core._apply_iterate(Base.iterate, Main.g, i@_26))
│ (i@_28 = Base.getindex(IP, 13))
│ (res@_27 = Core._apply_iterate(Base.iterate, Main.g, i@_28))
│ (i@_30 = Base.getindex(IP, 14))
│ (res@_29 = Core._apply_iterate(Base.iterate, Main.g, i@_30))
│ (i@_32 = Base.getindex(IP, 15))
│ (res@_31 = Core._apply_iterate(Base.iterate, Main.g, i@_32))
│ (i@_34 = Base.getindex(IP, 16))
│ (res@_33 = Core._apply_iterate(Base.iterate, Main.g, i@_34))
│ %33 = Main.nothing::Core.Compiler.Const(nothing, false)
└── return %33
And you see that it does infer res
and the calls. Of course, this puts all the effort into compilation time. If you benchmark, you can see that the computation was mostly done at compilation time (not runtime). General performance sensitive code does not normally follow this pattern, hence the “default” choices made in the language. But as you see, there are constructs that can remediate corner cases.
Hope this helps…