Creating a new value using `jl_new_struct_uninit`

barche · April 18, 2020, 11:51pm

In CxxWrap, C++ objects are wrapped in a struct like this:

struct CxxPtr{T} <: CxxBaseRef{T}
  cpp_object::Ptr{T}
end

Completely new objects are stored in a mutable version of the above struct, with exactly the same layout. On the C++ side, I use jl_new_struct to allocate the new struct, with as field argument the boxed pointer to store. However, this requires two allocations, so I tried to work around that using jl_new_struct_uninit, as seen here:

https://github.com/JuliaInterop/libcxxwrap-julia/pull/53/commits/5118cd9b07064b85a943f9835fceacb64ab714ce

Unfortunately, this appears to trigger a segfault. So my question is: what is the correct way to allocate a struct like the above one, using only a single allocation (if possible), for both the mutable and immutable version (immutable is for boxing when needed)?

Ronis_BR · April 19, 2020, 1:23am

Do you have a MWE of the segfault? Many times I saw unexplainable segfault was because I was forgetting to protect a variable from the garbage collector.

barche · April 19, 2020, 9:55am

No MWE unfortunately, I can’t even reproduce the problem anywhere except on Travis Linux (Mac is fine).

Checking for a GC problem is indeed always a good idea. Unfortunately running the tests without GC gives exactly the same error.

The log is here, but it doesn’t really say much: Travis CI - Test and Deploy Your Code with Confidence

barche · April 19, 2020, 8:41pm

Update: it seems that using memcpy to the directly dereferenced the pointer to the Julia object fixes it. Not sure what the difference is or if it is correct now, so having someone who is familiar with the Julia C code look at this would reassure me:

https://github.com/JuliaInterop/libcxxwrap-julia/pull/53/commits/733f403350a7c5ed477905e97328d41bfc4122d5

barche · April 30, 2020, 12:14pm

Another update: I think this commit makes it more equivalent with the way jl_new_struct behaves: https://github.com/JuliaInterop/libcxxwrap-julia/pull/53/commits/40db46874de18e8f09603d81f8eb18a011051400

However, testing what avoiding the extra allocation actually brings in speed gain is making me question the usefulness of the change: the original version takes 0.0066 s to allocate 100000 C++ objects (2 Julia allocations per C++ object), while the new version takes … 0.0058 s, at indeed only one Julia allocation per C++ object.

@yuyichao Do you have any thoughts on this, please? Is the new version safe, or is tinkering at this low level simply not worth it?

yuyichao · April 30, 2020, 4:30pm

I can only think of any issue here to be caused by aliasing and if that’s the case using memcpy should be enough.

I’m pretty sure travis support remote login that you can use to debug and download the binary from there.

barche · April 30, 2020, 6:39pm

I think previous crashes were caused by some kind of alignment-related UB, which I don’t understand. In Julia they seem to cast to the integer types because of that, or at least the git comments led me to this:

https://github.com/JuliaLang/julia/issues/26512

yuyichao · April 30, 2020, 7:04pm

What make you think it is related to alignment and no the julia commit has nothing to do with integer typeps.

barche · April 30, 2020, 8:21pm

The crash was fixed by using (on 64 bit)

uint64_t val = *(uint64_t*)(&cpp_ptr);
memcpy(dest, &val, 8);

instead of:

memcpy(dest, &cpp_ptr, sizeof(T*));

That change was inspired by the implementation of jl_store_unaligned_i64, which is called by set_nth_field in jl_new_struct, and which were modified in the commit that fixed the issue I cited.

What troubles me is that I don’t understand the difference between the crashing and the working implementation, so I have no idea why this fixes it (or even if it fixes it properly and always).

yuyichao · April 30, 2020, 10:28pm

Again, the only think I can think of is that you are triggering some UB (or compiler bug) and the best way to debug it would still be having a debugger attached to it one way or another.

And FYI, your “fixed” version made MORE assumptions about both alignment and aliasing, which is exactly the opposite of the julia commit you linked.

barche · April 30, 2020, 11:13pm

Possibly, part of the problem is that I don’t understand what those assumptions are and why they are needed, though. But the same casting to uint happens in Julia, why is that done?

github.com

JuliaLang/julia/blob/master/src/datatype.c#L693-L708


      
          
          
JL_DLLEXPORT jl_datatype_t *jl_new_primitivetype(jl_value_t *name, jl_module_t *module,
                                                           jl_datatype_t *super,
                                                           jl_svec_t *parameters, size_t nbits)
          {
              jl_datatype_t *bt = jl_new_datatype((jl_sym_t*)name, module, super, parameters,
                                                  jl_emptysvec, jl_emptysvec, jl_emptysvec, 0, 0, 0);
              uint32_t nbytes = (nbits + 7) / 8;
              uint32_t alignm = next_power_of_two(nbytes);
              if (alignm > MAX_ALIGN)
                  alignm = MAX_ALIGN;
              bt->isbitstype = (parameters == jl_emptysvec);
              bt->size = nbytes;
              bt->layout = jl_get_layout(0, 0, alignm, 0, NULL, NULL);
              bt->instance = NULL;
              return bt;

When I had similar crashes locally, valgrind and the debugger didn’t point to anything special, except that accessing the created Julia object results in a crash, and creating it using the higher level jl_new_struct fixed it (or at least the crash went away), but introduced an extra allocation for boxing the pointer.

yuyichao · April 30, 2020, 11:28pm

Because it is making the assumption that the alignment and type are correct. Also, there’s no casting to uint anywhere, the cast is on the pointer. These are just sligntly more optimized case of the default branch. They are there for the sole purpose of giving the compiler some hint to produce better code for the most likely usecase. It will be completely correct and should not crash without any branchs other than the default one.

Well, I’m not sure what do you mean by “debugger didn’t point to anything special”. I assume the debugger will point to where it crashes which will tell you what instruction and point it is and should allow you to inspect the condition it happens, the normal thing you’ll use a debugger to do.

barche · May 1, 2020, 12:35pm

Thanks for the clarification, and indeed after switching to the default branch implementation and also exactly the same implementation that crashed before, it doesn’t crash anymore. Which is actually bad news, because it means there is probably some kind of UB elsewhere.

What I mean with “nothng special” is that when I did manage to reproduce the crash in a debugger, it always segfaulted on this seemingly innocent line, and valgrind didn’t show anything invalid up to that point:

https://github.com/JuliaInterop/libcxxwrap-julia/blob/fix-double-alloc/include/jlcxx/const_array.hpp#L94

Here, ptr is a Julia object created by the code we discussed.

And while writing this, it crashed again on this change, which should actually be no change: Comparing 035d575df6b2e30d03d1d8142222aef1e035b39f..a4dce9720f32ba9bb4d06efe0c54deeb65f0a6c2 · JuliaInterop/libcxxwrap-julia · GitHub

(log here)

yuyichao · May 1, 2020, 1:06pm

The debugger has way more functions than just line numbers. As I said above, you should look at the instructions and variables.

barche · May 4, 2020, 1:16pm

OK, I have inspected the value returned by both implementations, and it is always the same, I haven’t been able to reproduce the crash on Travis either anymore. My best guess at this point is that our discussion properly aligned the planets so it works now

I’ve merged the change, since this part of the code was probably never the cause of the crashes in the first place. Thanks for the help!

Topic		Replies	Views
Segfault on struct creation; gc involved New to Julia question	6	525	March 17, 2020
C struct garbage collection not run frequently enough General Usage garbage-collection , mutable-structure , c , gc	28	443	July 14, 2024
Pass (to Julia) a C struct which contains a pointer to another struct General Usage array , struct	11	946	August 7, 2020
Why this code does a segfault? General Usage ccall , c	21	1220	August 15, 2021
Why mutable structs are allocated on the heap? General Usage question	36	7928	September 26, 2024

Creating a new value using `jl_new_struct_uninit`

Related topics