Garbage collection of Julia code in C++

embedding

#1

Hello everyone,
I have a trivial question regarding the garbage collection for the c api of Julia. I am currently embedding Julia in a C++ class, and I need to access different mutable structs that I create in the C++ class constructor throughout all of its methods. To do so, I store pointers to jl_value_t* inside the class, and assign them to Julia objects in the constructor, sort of like this:

class Component
{
public:
	Component()
	{
		jl_init();

		/* some code to include functions from an external .jl file.... */

		jl_function_t* mutable_struct_constructor = jl_get_function(jl_current_module, "MyMutableStruct");

		/* should I worry about the fact that pointer_to_mutable struct, that now holds a pointer to an allocated
		mutable struct, can be deleted and not used in other methods of this class? */

		pointer_to_mutable_struct = jl_call0(mutable_struct_constructor);
	}

	~Component()
	{
		jl_atexit_hook(0);
	}

	void uselessFunction()
	{
		/* Can I use the mutable struct with other functions in here? It looks to be working in my code, but does it break? */
	}
private:
	jl_value_t* pointer_to_mutable_struct;
};

I was thinking of a workaround, which is to make the allocated mutable struct a global variable, like so:

class Component
{
public:
	Component()
	{
		jl_init();

		/* some code to include functions from an external .jl file.... */

		jl_function_t* mutable_struct_constructor = jl_get_function(jl_current_module, "MyMutableStruct");

		/* should I worry about the fact that pointer_to_mutable struct, that now holds a pointer to an allocated
		mutable struct, can be deleted and not used in other methods of this class? */

		jl_set_global(jl_current_module, jl_symbol("MutableStruct"),  mutable_struct_constructor);
		pointer_to_mutable_struct = jl_get_global(jl_symbol("MutableStruct"));
	}

	~Component()
	{
		jl_atexit_hook(0);
	}

	void uselessFunction()
	{
		/* This seems to work aswell. */
	}
private:
	jl_value_t* pointer_to_mutable_struct;
};

Would the JL_GC_PUSH and POP macros work across the class? If I were to push the variable in the constructor and popping it in the destructor? I feel like it wouldn’t.
My question then is: what would be the best way to make sure that objects are not deleted by the GC? Also, if I were to disable the GC, how could I delete some allocated code (e.g., the data that a jl_value_t* is pointing at) by myself? Sorry if these are trivial questions!

Thanks


#2

Make sure you don’t actually do this unless there will only ever be one Component constructed during the whole lifetime of the program.

Yes.

No.

Don’t.

In general, the lifeness of an object has to be bound to either a local scope or global variables. Additionally, the bound could be indirect through parent object(s).

If the object you want to keep alive is not directly global or easily tied to a scope, you need to store it in a parent object during the time you want to keep it alive. You can add or remove the reference to your object from this parent object to realize the lifetime you want. Such an object can be anything that you can store your references in and unless you have some special requirement you could just use an ObjectIdDict.

Now the only question left is how do you keep this parent object alive. Unless you want to do something fancy you should just keep it alive either in a global (store it to a global variable in julia) or a local scope (JL_GC_PUSH…). A global variable is more general while a local frame could be faster. Which one of these to use depends on your usecase.


#3

In general, the lifeness of an object has to be bound to either a local scope or global variables. Additionally, the bound could be indirect through parent object(s).

If the object you want to keep alive is not directly global or easily tied to a scope, you need to store it in a parent object during the time you want to keep it alive. You can add or remove the reference to your object from this parent object to realize the lifetime you want. Such an object can be anything that you can store your references in and unless you have some special requirement you could just use an ObjectIdDict.

Now the only question left is how do you keep this parent object alive. Unless you want to do something fancy you should just keep it alive either in a global (store it to a global variable in julia) or a local scope (JL_GC_PUSH…). A global variable is more general while a local frame could be faster. Which one of these to use depends on your usecase.

What if the scope of the mutable struct is the C++ class, with all its methods. How should I go about it? Should I make the allocated mutable struct object global, as I have done here? :

jl_set_global(jl_current_module, jl_symbol("MutableStruct"),  mutable_struct_constructor);
pointer_to_mutable_struct = jl_get_global(jl_symbol("MutableStruct"));

Also, what should I do to deallocate it, or to let know the GC that the object can be freed. Should I do something like this in the destructor of my C++ class:

jl_set_global(jl_current_module, jl_symbol("MutableStruct"), nullptr);
pointer_to_mutable_struct = nullptr;

Would this approach work to dynamically handle what I want the GC to collect or not? Would there be more elegant ways? Thanks again


#4

… I thought the decision process I gave was pretty unambiguous…

It works if you won’t create multiple objects at the same time. (Also, you don’t need the jl_get_global.) If you want to create more than one objects at the same time, you should use what I mentioned above.


#5

… I thought the decision process I gave was pretty unambiguous…

I am sorry that I didn’t understand what you meant. Looking back at it, would this make sense then?
In the private space of the C++ class:

std::vector<uintptr_t> this_object_table_of_references;

In the constructor:

//create object
pointer_to_mutable_struct = jl_call0(jl_get_function(jl_current_module, "MyMutableStruct"));

//add its id to the vector
this_object_table_of_references.push_back(jl_object_id(pointer_to_mutable_struct));

//add other eventual objects to the vector.....
//
//

In the destructor:

this_object_table_of_references.clear();

Sorry again for my misunderstanding, I am slowly learning how to use Julia and how to link it to my C++ projects.


#6

No julia’s GC can only scan julia objects. Therefore, you need to store the reference in a julia object. From the way you describe things and that you are from C++, I’m guessing you’ve never used a tracing GC. You may want to have a look at https://en.wikipedia.org/wiki/Tracing_garbage_collection (only the first part which should be pretty easy to understand) to know what a tracing GC is. You don’t need to know all the detail but knowing how it find all live objects will hopefully make what I said make more sense.

Other than having to use a julia object, what you have is roughtly correct. The few other differences from what I said is that you should probably use a hash table (ObjectIdDict is one, unordered_map in c++) or a sorted data structure if you don’t want a linear scan in the destructor to pop it. You probably also don’t want to simply clear it since it’ll remove other objects from it too.

I couldn’t tell if you have this right but another thing you might have got wrong is that the table need to be rooted either in a c++ local scope or as a global variable. (It must not be a c++ non-static member but something managed externally.) If there are multiple objects you want to keep alive that belongs to the same though you can obviously store then in an Julia container that’s private to the c++ object and root this container globally so it’s easier to pop it out of the global dict all at once.


#7

Other than having to use a julia object, what you have is roughtly correct. The few other differences from what I said is that you should probably use a hash table (ObjectIdDict is one, unordered_map in c++) or a sorted data structure if you don’t want a linear scan in the destructor to pop it. You probably also don’t want to simply clear it since it’ll remove other objects from it too.

Thank you very much for your help and the explanations, it is much clearer now in my head! I think I got it working, is that so? :

class Component
{
public:
	Component()
	{
		object_id_dict = jl_call0(jl_get_function(jl_current_module, "ObjectIdDict"));
        jl_function_t* add_index_to_dict = jl_get_function(jl_current_module, "setindex!");

		pointer_to_mutable_struct = jl_call0(jl_get_function(jl_current_module, "MyMutableStruct"));

		//add the object to the references of the ObjectIdDict

		jl_call3(add_index_to_dict, object_id_dict, pointer_to_mutable_struct, pointer_to_mutable_struct);
	}

	~Component()
	{
		//remove all the references from the ObjectIdDict of this class
		jl_function_t* empty_object_id_dict = jl_get_function(jl_current_module, "empty!");
        jl_call1(empty_object_id_dict, object_id_dict);
	}

	/* ... all functions that use pointer_to_mutable_struct here... */

private:
	jl_value_t* object_id_dict;
	jl_value_t* pointer_to_mutable_struct;

};

#8

Actually no…

Creating an object, ObjectIdDict or others, will NOT make julia keep it alive. You still need to store it either as a global variable or as a local root.


#9

So, if I only store the ObjectIdDict as a global variable, would it also keep the objects it is pointing to alive? Or would they need to be global aswell? Also, how could I go for the local root approach?Do you have any hints about it? Sorry if I am being annoying.
Thanks a lot


#10

Yes.

It’s the JL_GC_PUSH/POP macros. They can only be used within the same C++ scope.

Also note that you still need to answer,

  1. do you want to create more than one object? (If yes, you should not create new object and store it to the same global variable each time since they will overwrite each other)
  2. do you need to keep alive multiple julia object per c++ object? (If not, you shouldn’t create a c++ object local julia container)

#11

Thanks again. It is so much clearer now. Should I worry about setting the global variable that is associated with my OrderIdDict to a nullptr before exiting the code? Or would that be taken care of in the final cleanup of the Julia GC at jl_atexit_hook() anyway?

I still have to figure out how would my code work, I am just prototyping and learning the API at the moment.


#12

That should not be needed. In fact, unsetting the variable could be dangerous (there’s no julia code that can do this so the compiler makes assumptions that it cannot happen). If you are really paranoid you can just set that global variable to nothing.


#13

I have two last questions:

  1. Could I make a ObjectIdDict of ObjectIdDicts? Considering that I may have some parent class which needs to know about the ObjectIdDicts of several child classes.
  2. To remove a reference of an object from a global ObjectIdDict, should I just delete the entry with the delete! method? Would this then make the object that I was referencing to available for the GC, when it feels like collecting? Would this approach also work if I were to remove an eventual child ObjectIdDict entry (and all the objects’ references that it was storing) from a global parent one?

#14

ObjectIdDict is just a very basic hash table. You don’t need to use it unless you need to push and pop from it randomly. It can obviously hold any objects (though if that’s your question you should probably familiarize yourself with containers in julia).

Yes. And you can even just store nothing as the value, which is likely better in every ways. You don’t need an ObjectIdDict for each object unless you want to constantly popping from it. (An array would do just fine and much faster if you are just keep pushing to it).

There might have been a bug causing the values in the keys or the values of an ObjectIdDict to be not free’d that’s recently fixed. Other than that, any objects unreachable are dead for the GC so they will be free’d when the GC feels like it assuming there’s no other references to them.


#15

Would it then also make sense to create a global Array{Ref} to store all the references to my Julia objects? Would the array keep them alive as long as the reference inside the array exists?


#16

Don’t. You use Array{Any} (or more accurately Vector{Any}). It’s fine but will be hard to pop. As mentioned above, it is a good choice if you don’t need to pop.


#17

There are also functions in CxxWrap.jl for protecting/unprotecting:

Note that if you use this directly, the array keeping the objects alive will live in the CxxWrap module, maybe I should change that behavior now the C++ part of the package is a separate library.