Optimization allowed in C++14, not done in Julia, but could it be done?

I tried something similar to here:

julia> function demo()
         for i in 0:9
           v = [1, 2, 3]
           push!(v, i)
demo (generic function with 1 method)

julia> @code_native demo()

In Julia v is allocated, but not in similar code in C++ (optimized away).

Could Julia also do that? It seems it could, and doesn’t need permission from a spec. While C++14 spec allows it, it’s not clear why it needs to be in the spec, since it’s “allowed” not “required”?

Does Julia exploit information about newly created object in any way (with garbage collection, you may not in general rely on no other pointers to them, but here you could)?

I also tried without the println.


Did you try it with a static vector, which is probably the better comparison to that particular cpp example?

I think this cannot be done because of side effects. Pushing a value to a vector can result in out of memory error. Many things with side effects like error throwing can suppress some further simplifications.


Yeah, not sure. Thought there was a chance since it would just be creating new tuples from each other but perhaps you are probably right

I was saying about regular vectors but not SVector but it is not direct comparison to allocated vectors.

The way it currently works, this is unlikely to happen soon.

Array is actually not a “real” julia type/object. It is implemented by the C runtime, and mostly used via the foreign function interface. Relevant exceptions are getindex, setindex! and arraysize, which are special-cased by codegen.

In other words, llvm cannot peek into Array internals for optimizations.

More relevant than allocation, push! / array resizing is not special-cased either, which is why push! is so damn slow (cannot be inlined, ~10 cycles).

I think the most realistic way is to compile some subset of the runtime library into both machine-code and llvm bitcode ( clang -fembed-bitcode / -lto-embed-bitcode style), and use some variant of link-time-optimization when loading a dynamic library with bundled bitcode. The same applies to BigInt and string.

This is a quite general FFI problem:

Have some nice C / rust / fortran code, and try to use it from julia; this works super nice and convenient with OK perf via the current method (compile foreign code into dynamic lib, load lib, use foreign function call).

When you want to get rid of the remaining overhead, e.g. want some foreign functions inlined, then you need to embed bitcode into the dynamic library (by now super common and convenient on apple; not too hard to do on linux; no idea about windows; but fundamentally this can only work with llvm-based compilers, like clang, flang, rust, and won’t work with the likes of gcc, icc or msvc).

Then you need some ugly dance with clang.jl that I forgot the details of. I’d prefer if this was part of julia Core, just like loading of dynamic libs is; especially since embedded bitcode is becoming so prevalent in the apple world, and a future of ThinLTO dynamic libraries for JIT and debug purposes would be awesome.


I think this cannot be done because of side effects.

But why C++ can do that? Memory error doesn’t create too much problem in this case. Consider this Julia code:

function f(x)
    y = 1
    if x > 0
        y = 2
    return y

Julia(or llvm) actually finds out that this function can only return 2:

julia> @code_llvm debuginfo=:none f(1)
define i64 @julia_f_211(i64 signext %0) {
  %1 = icmp slt i64 %0, 1
  br i1 %1, label %L6, label %L3

L3:                                               ; preds = %top
  ret i64 2

L6:                                               ; preds = %top
  %2 = call nonnull {}* @j_error_213()
  call void @llvm.trap()

Because the standard explicitly says it can, see the C++14 section in new expression - cppreference.com.

1 Like