Using cppyy for julia/C++ interoperability

mkitti · July 9, 2021, 7:56pm

Without an already a deployed Python package this is going to be very difficult to get into the hands of Julia users via the Python route. I’m also not feeling very inspired to wade through the Python packaging process to figure out how to get this to work. Perhaps the conda-forge folks would be able to assist you with this.

I’m also working on my Windows machine at the moment, so this is going to be difficult install an Ubuntu dependency.

I would much rather target the shared library produced by this makefile:
https://bitbucket.org/kfj/python-vspline/src/master/bspline/makefile

We could probably get the dependencies packaged up via binarybuilder.org.

@barche , what would your approach be for providing a Julia interface for this C++ code?

https://bitbucket.org/kfj/python-vspline/src/master/bspline/config.py

init.py

kfjahnke · July 10, 2021, 8:33am

@mkitti

Without an already a deployed Python package this is going to be very difficult to get into the hands of Julia users via the Python route. I’m also not feeling very inspired to wade through the Python packaging process to figure out how to get this to work. Perhaps the conda-forge folks would be able to assist you with this.

Fair enough - my python package is still experimental. I think I should be able to get the python packaging done myself, but this may take some time. I’m busy with other stuff right now. If I get the python package together so that you can pip-install it I’ll let you know.

I’m also working on my Windows machine at the moment, so this is going to be difficult install an Ubuntu dependency.

The part of vigra that’s needed for vspline is just a bunch of headers, vigra is available for all common platforms. I did assume you were working on a linux machine, hence the proposed package. If you’re looking for other ways to install vigra, look at the vigra website.

*** I included the web site here, but julialang org informs me that I must not link to that host, so you’ll have to use a search engine to find the code. sorry for the inconvenience. *********

which also has a download section. You’ll only need the headers, not the library or the python interface (the latter is done with BOOST and I don’t use it)

Alternatively I might redistribute the vigra headers I use via my repo (as I already do with my own vspline headers), but before doing so I’ll ask the author to confirm that this is compatible with his license. This would reduce external dependencies.

If you opt to use Vc, this would require downloading that as well - it’s a dependency for the Vc variants of the shared library. You can find it here:

You’ll need the 1.4 branch of that repo. Using Vc requires linking a small static library (libVc.a) which is generated by the repo’s build mechanism. If you just want to go ahead and get something working fast, you can leave the use of Vc until later, it’s only effect is a welcome, but not dramatic, speedup of the code.

I would much rather target the shared library produced by this makefile:
…

We could probably get the dependencies packaged up via binarybuilder.org

I think this is a viable option, but it would reduce the scope to the pre-selected set of instantiations, namely 1-3D, 1-3channels, 32 and 64 bit floats. These should cover 95% of the use cases, though, and would already be a very useful amount of functionality. What would be lost in this approach is the exciting aspect of having further instantiations provided at run-time via cppyy.

If you opt to omit Vc for the time being, you’d only build the targets in the makefile with the ‘novc’ infix, and vigra would be the only dependency - plus pthreads for the multithreaded version, which may need additional installation steps on windows. I recommend you use clang++ to compile my code, but I’d be interested to see how msvc or the intel compiler performs. So far, I’ve only tested this build on linux with clang++ and g++.

The declarations and definitions for the given scope (which can easily be changed by modifying a few constants in the relevant headers set_*.h) result in fully-specialized template instantiations, which should be wrappable. The relevant starting point is vspline_binary.h

For the declarations, and the corresponding .cc file for the definitions. From there, it’s a cascade of repeating includes with different macro parameters, culminating in declare.h and use_apply.h, which have the macros for the actual declarations/definitions. If you follow the include hierarchy, you can see how the final outcome is a set of fully specialized template instantiations. This structured process should make it reasonably straightforward to add annotations if they should be needed by the wrapping code, but it would be nice if we could avoid intrusion into the C++ source.

Thanks for your continued interest! Getting the fixed-scope code wrapped should not be too hard, and I’m curious to see what your colleague might add to the discussion!

kfjahnke · July 12, 2021, 7:43am

For those among you who want to follow my escapades of wrapping vspline via it’s extant python module, I’ve made a bit of progress: I’ve just pushed a commit to vspline’s repo which has handling of data in fortran order, which might be a genuinely useful bit of trickery, hence I quote the python code here (also for inspection, in case I’ve missed something):

  # let the incoming ndarray be 'A'

  if A.flags [ 'F_CONTIGUOUS' ] == True :

    rshape = tuple ( reversed ( A.shape )  )
    rstrides = tuple ( reversed ( A.strides )  )
    
    A = np.ndarray ( rshape , A.dtype , A.ravel('K') ,  0 , rstrides )

Previously, C memory order was silently assumed, but when julia data are auto-converted to Python objects via the array protocol, they come out in fortran memory order. So now you can pass julia arrays directly to the python module. Really, the term ‘memory order’ is misleading: the memory is just the same, all that’s needed is the reversal of the shape and stride tuples, which is a trivial task in the python module. The code merely produces new, transient views to the data, so the final effect is that the C++ code in vspline directly operates on the julia arrays with no intermediate copies or new arrays returned - the julia side has to provide source/target arrays for the transform-like routines. So, as you might expect, apart from a bit of call overhead the code is blazingly fast

julia arrays holding multi-channel data (like pixels) are expected to hold the number of channels as their first extent, and the stride for this axis must be one - or, to put it differently, vspline expects multi-channel data in interleaved format, with no gaps between the channels. If your data are channel-separated, you have to loop over single-channel slices instead.

Here’s a small example (note that ‘bspline’ is the b-spline module from python-vspline, not the module from PyPI):

using PyCall
bspline = pyimport("bspline")
src = ones(Float32,3,1000,1000)
trg = zeros(Float32,3,1000,1000)
bspl = bspline.bspline("float", 3, 2, [1000,1000])
bspl.prefilter(src)
bspline.index_transform(bspl,trg)
print( src == trg )

What does this do? After the obvious import, it sets up a ‘source’ array of 1000X1000 ‘pixels’ of three floats each (note the ‘3’ as the first extent), all set to 1.0, and a target array of the same shape filled with zeros. The source array is ‘sucked into’ the b-spline with the ‘prefilter’ function, and then the spline is evaluated at every pair of discrete 2D pixel coordinates, depositing the result in trg. The final print statement assures us that the evaluation indeed reproduced the knot values in ‘src’, which is even precisely true in this case, because there is only DC.

mkitti · July 12, 2021, 7:49am

How hard would it be to expose a C API for that code? They we could just use ccall.

kfjahnke · July 12, 2021, 8:25am

It depends on what you mean by ‘that code’. If you want a C API to process 2D arrays of three-float pixels, that’s not a problem, and your idea to wrap the shared library which the makefile in the python-vspline repo produces would amount to that: a collection of pre-compiled instantiations of the C++ template metacode. But if you want to process e.g. five-channel pixels in a 4D array, you’d be out of luck. The shared libs made in the python-vspline repo are simply to reduce warm-up times: cling recognizes the incoming mangled symbols from these precompiled libraries and uses them rather than instatiating afresh from the C++ source, which saves time. So I include precompiled code for the more common operations and leave it to cling to deal with the rest. The result is the best of both worlds: common stuff can use precompiled binary, unknown stuff is JITted together at run-time. You end up with an interface with the same scope as the native C++ template metacode, minus a bit of friction. Note that use of the shared libs is optional and off by default - if you don’t set the relevant flags in configure.py and the shared libs aren’t present, everything is done by cppyy/cling, at the expense of slower warmup (see warmup.py for comparisons).

The magic of cppyy is that it can wrap C++. Not just some subset which you have to explain to the wrapper with annotations, but simply the whole C++ language with all it’s features. So the cppyy wrap can do stuff like the instantiation of templates at run time. You simply can’t do that in C - you have to settle on a set of template instantiations, assign them to C variables and have some sort of switch statement to pick one of this preconceived set. Stuff which you haven’t got already compiled simply can’t be done. And a shared library is simply a bunch of symbols tied to bits of binary code, just like C variables. Hence my use of cppyy which in turn uses cling to compile C++ code as needed at run-time. If you mean that code it can’t be reduced to C.

kfjahnke · July 13, 2021, 10:01am

When I next looked ‘into’ the bspline object to inspect the array of coefficients, I ran into another difficulty: the bspline object holds the coeffcients in an object called ‘container’, and the coefficients which correspond directly to knot points in a slice thereof, called ‘core’. Both objects are of the bspline module’s ‘array’ type and have a member ‘as_np’ which is a NumPy ndarray. So to inspect the ‘core’ object in julia, the ‘natural’ way is to use the as_np member. The problem I ran into was the fact that the julia side receives this as a copy, so assignments to it do not work. To get access to the underlying data, a bit of trickery was needed (sorry julians to whom this may all be ‘old hat’, it’s all precisely as stated in the documentation)

Given a bspline object bsp (from the example in my post), we need this bit:

core = PyArray(py"$bspl.core.as_np"o)

Now we can even write to ‘core’, directly manipulating the coefficients.

What’s the fuss all about? vspline tries to be as efficient as possible. The route of transporting data into the spline (via the ‘prefilter’ member function) is only second best if the data are manifest elsewhere, like in a file. If the entity providing data can directly write to an array, the most efficient variant is to first create the bspline object (which allocates memory for the coefficients) and then pass it’s core as an argument to the function which can provide the data, so that it deposits the data straight to the desired destination, without the need for an intermediate array. The subsequent prefiltering is then done by omitting the data source argument, so it’ simply bspl.prefilter() in most cases, operating in-place on the deposited data.

With this little bit of trickery, the wrap should be usable as intended. One might consider a bit of additional ‘julianization’, e.g by creating a julia ‘bspline’ object which already holds PyArray members to access the coefficient array. This would be nice-to-have but not strictly necessary. My conclusion, so far, is that the method I propose works well for the module at hand and provides an instantly useful wrap of vspline’s bspline module with minimal code and complete coverage of the module’s functionality (which would still need proper testing to substantiate this claim).

The downside is, obviously, the need to get hold of a bit of C++ source code (vigra, optionally Vc), and to install Numpy, cppyy (currently < 2.0) and the python-vspline ‘bspline’ module. This may be an obstacle to users who aren’t familiar with handling such content, but there might be ways to pave the way for them. Compared to the task of writing an (incomplete) C API for the code, inserting annotations into the code to make it acceptable for CxWrap.jl, or waiting for Cxx.jl to become operational again, I feel that the proposed route is better, especially if ways can be found to automate dependency acquisition and a (small) julia layer can be added to help access the NumPy data.

mkitti · July 14, 2021, 6:15pm

There is some movement from Cling in that they are now trying to upstream their code into LLVM to create Clang-REPL. This increases the likelihood that we could integrate this into Julia since we might be able to use recent versions of LLVM (e.g. the same one integrated with Julia).

kfjahnke · July 18, 2021, 7:08am

I’m quite convinced that incremental compilation of C++ will turn out the preferred solution for wrapping C++ code. My use of cppyy to wrap vspline for python has convinced me that this is so, and I am all for julia adopting a similar route. While this is still ‘in the pipeline’, people who want to play with these new possibilities can go via python and cppyy - and, as @wlav has just posted, the bug which stopped julia from working with cppyy 2.0 has been ironed out; I’ve tried my little examples above with cppyy 2.1 and everything worked as expected. If you want to run my examples, please upgrade to cppyy 2.1 with

pip3 install --upgrade cppyy

I am also quite convinced that, when it comes to invoking complex, long running C++ code for number-crunching, going via an intermediate python module adds little overhead, whereas the pythonization seems to be a great help when it comes to interfacing with julia. The pythonization represents a significant effort, and rewriting it in julia just to ‘cut out’ the python layer seems wasteful to me.

mkitti · July 18, 2021, 7:46pm

That’s great. Detailed build instructions as a platform agnostic shell script would really helpful to move this along. It would hopefully cover downloading / cloning the code and the dependencies through installation in the Python environment. That would give us a chance of shoving it into a BinaryBuilder.org build_tarballs.jl script:
Building Packages · BinaryBuilder.jl

I’m also eagerly watching progress in ClangCompiler.jl.

kfjahnke · July 19, 2021, 7:57am

That’s great. Detailed build instructions as a platform agnostic shell script would really helpful to move this along.

To clarify: building the shared libraries for the python module is not necessary. It’s nice-to-have to get faster start-up times, but otherwise performance is the same. The python module, as it’s presented at python-vspline, is configured to not use the shared libraries, if using them is desired, this can be achieved by building them and setting the relevant flags in config.py in the module’s root folder. This is optional.

What’s currently missing is

explicit permission from vigra’s author to redistribute his (C++ header) code
availability of the python module through PyPI

The first one is for form only, the license explicitly allows such use - I’ve written to the author some days ago but he does not reply. It would make it unnecessary to have a vigra installation on the target machine. The second one would require that the module is packaged for and uploaded to PyPI, and right now I don’t have the time to do that.

It would hopefully cover downloading / cloning the code and the dependencies through installation in the Python environment.

I don’t know whether pip can install a set of C++ header files to a target machine. It would also be overkill, because really I only need a small subset of the vigra code, mainly to handle nD arrays and small aggregates. I only want to redistribute that subset.

That would give us a chance of shoving it into a BinaryBuilder.org build_tarballs.jl script:

We will not need binarybuilder, because there won’t be binaries: the C++ code is interpreted at run-time with cling. The C++ code has to be present, the shared libraries are optional, let’s leave them out for now, because they only complicate matters. I pointed you to the shared libraries because you were looking for binary code to be wrapped with Cxx.jl or CxxWrap.jl; if you are prepared to follow through with the cppyy-route, this won’t be necessary. So to reiterate: currently you need

an install of libvigraimpex-dev (the source code, not the binary) in your include path
the bspline folder from python-vspline (best clone the repo)
a recent install of cppyy in your python3 environment

With these components, my examples should run. All three points above might be achieved together automatically if I re-distribute the relevant vigra headers and the python module is made available via PyPI.

kfjahnke · July 19, 2021, 8:13am

This seems to be in it’s early stages, but it looks promising. What I’d like to see is some kind of documentation beyond the mere proof of concept. Apart from that, directly wrapping the C++ code with julia would be the most desirable route from a julia standpoint.

Wrapping vspline’s C++ code directly with julia would abandon the pythonization layer, which is an asset and has cost several weeks of development time. So ‘julianization’ code would be needed to, for example, interface the vigra::MultiArray objects with julia arrays - as the python layer now does to interface with NumPy ndarrays. Using the python module with PyCall allows julia to interface with the NumPy arrays which the python module presents directly, and additional code is not needed.

mkitti · July 19, 2021, 12:26pm

The license is pretty clear:
http://ukoethe.github.io/vigra/LICENSE.txt

The VIGRA License
=================
(identical to the MIT X11 License)

Permission is hereby granted, free of charge, to any person    
obtaining a copy of this software and associated documentation 
files (the "Software"), to deal in the Software without        
restriction, including without limitation the rights to use,   
copy, modify, merge, publish, distribute, sublicense, and/or   
sell copies of the Software, and to permit persons to whom the 
Software is furnished to do so, subject to the following       
conditions:                                                    
                                                               
The above copyright notice and this permission notice shall be 
included in all copies or substantial portions of the          
Software.                                                      
                                                               
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND 
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND       
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT    
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,   
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING   
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR  
OTHER DEALINGS IN THE SOFTWARE.

mkitti · July 19, 2021, 12:29pm

Honestly, I think conda-forge is the better route for this.

kfjahnke · July 20, 2021, 7:27am

Yes. I told you it’s for form only - I have been using vigra for many years now and it’s been a great source of inspiration. I feel indebted to it’s author(s) - asking for permission felt like the decent thing to do. If you go by mere legality, the text of the license gives me permission to redistribute it. So I wrote to U. Koethe, he did not answer (yet), and I’ll go with the text of the license instead and include the relevant vigra headers with the python package, reducing the dependencies by one. I’ll post again when I’ve done so.

Thanks for the hint! I’m no expert at packaging python modules. I’ll have a look.

kfjahnke · July 20, 2021, 4:56pm

I added the relevant vigra headers to the python-vspline git repo. Now all you need to do is

pip3 install numpy cppyy
git clone hhttps://bitbucket.org/kfj/python-vspline

Then you can just link the bspline folder (which contains the python module) to somewhere in your python path. You needn’t install anything. Here on my system, I do this:

ln -s /path/to/python-vspline/bspline /home/kfj/.local/lib/python3.8/site-packages

Now running my julia examples should work. Let me know if this does the trick your end. Then maybe we can make a shell script for the installation.

Topic		Replies	Views
Pyjulia in desperate need of attention form someone who knows what they're doing General Usage	22	5820	November 24, 2017
Using a C++ library with Julia 1.. New to Julia cxx	32	10267	August 9, 2021
Jluna: a new Julia <-> C++ wrapper Package Announcements package , c	18	4992	September 8, 2022
Julia with C++ (C++20 and modules) vs. e.g. Python and pybind11 General Usage cxx , cxxwrap	3	4976	June 21, 2019
Best way to create a julia function wrapper of a cpp function General Usage cxx , cxxwrap	9	3864	July 4, 2019

Using cppyy for julia/C++ interoperability

Related topics