TLDR: It looks like the compiler decides to inline `sin`

and `exp`

for your second version but not for the combined version.

I used `$`

to get a bit more reliable benchmark results. Your code is basically equivalent to

```
julia> using BenchmarkTools
julia> A = [1.0, 2.0, 3.0, 4.0, 5.0];
julia> f(x) = sin.(exp.(x))
julia> h(x) = begin tmp = exp.(x); sin.(tmp) end
julia> @benchmark f($A)
BenchmarkTools.Trial:
memory estimate: 128 bytes
allocs estimate: 1
--------------
minimum time: 128.500 ns (0.00% GC)
median time: 137.732 ns (0.00% GC)
mean time: 147.684 ns (0.80% GC)
maximum time: 859.902 ns (74.62% GC)
--------------
samples: 10000
evals/sample: 864
julia> @benchmark h($A)
BenchmarkTools.Trial:
memory estimate: 256 bytes
allocs estimate: 2
--------------
minimum time: 105.619 ns (0.00% GC)
median time: 112.411 ns (0.00% GC)
mean time: 119.303 ns (1.69% GC)
maximum time: 809.603 ns (78.31% GC)
--------------
samples: 10000
evals/sample: 927
```

Since a temporary array is created in `h`

, the number of allocations is twice as big as for `f`

where only one array needs to be allocated. However, the compiler seems to decide that it’s good to inline `sin`

and `exp`

when they are not fused but to perform real function calls when they are fused.

```
julia> sin_exp(x) = sin(exp(x))
sin_exp (generic function with 1 method)
julia> @code_llvm sin_exp(1.0)
; @ REPL[24]:1 within `sin_exp'
define double @julia_sin_exp_584(double) {
top:
%1 = call double @j_exp_585(double %0)
%2 = call double @j_sin_586(double %1)
ret double %2
}
julia> @code_llvm sin(exp(1.0))
[...]
```

I also tried to use `@inline`

and `Base.@_inline_meta`

when defining `sin_exp`

, but that didn’t help.

Note that you can get a nice speedup using LoopVectorization.jl for your example.

```
julia> using BenchmarkTools, LoopVectorization
julia> f(x) = sin.(exp.(x))
f (generic function with 1 method)
julia> f_avx(x) = @avx sin.(exp.(x))
f_avx (generic function with 1 method)
julia> h_avx(x) = begin @avx tmp = exp.(x); @avx sin.(tmp) end
h_avx (generic function with 1 method)
julia> A = [1.0, 2.0, 3.0, 4.0, 5.0];
julia> f(A) ≈ f_avx(A) ≈ h_avx(A)
true
julia> @benchmark f_avx($A)
BenchmarkTools.Trial:
memory estimate: 128 bytes
allocs estimate: 1
--------------
minimum time: 46.067 ns (0.00% GC)
median time: 49.266 ns (0.00% GC)
mean time: 52.956 ns (2.50% GC)
maximum time: 881.744 ns (84.21% GC)
--------------
samples: 10000
evals/sample: 987
julia> @benchmark h_avx($A)
BenchmarkTools.Trial:
memory estimate: 256 bytes
allocs estimate: 2
--------------
minimum time: 66.805 ns (0.00% GC)
median time: 69.883 ns (0.00% GC)
mean time: 76.609 ns (3.52% GC)
maximum time: 1.106 μs (91.89% GC)
--------------
samples: 10000
evals/sample: 975
```

That’s more like what I would have expected in this case.