Julia's applicable context is getting narrower over time?

I had to google python metaclasses to tell the truth, I’d never heard of them and the links that came up suggested they were deep pythonista stuff not to be used by mere mortals.

However, having watched the link on metaprogramming above (i.e. take care), that might have been a sensible move!

Hey, looks like I got pinged for a reply! I do fit in a strange cadre of a person who started developing a library in Julia whose core functionality has been rewritten in C. So here is some insight into that decision:

  • I had some interest from folks in the Python community and currently C interoperability is more universal (that is, the original motivation was not performance);
  • Until integration with BinaryBuilder, the Julia package was maintained in parallel with C development;
  • For increased performance, the C library uses SIMD and OpenMP. For most applications, Julia’s vectorization and shared memory parallelism capabilities are going to be sufficient, easy, and excellent. However, I found that the necessary vectorization paradigm is not one that may be turned on by a macro like, e.g. @avx. (For reference, it is essentially multiple sweeps of Givens rotations between contiguous or nearly contiguous entries in a vector. Each sweep is nonlocal in memory, though, as it will touch all entries in the vector. So to increase vectorization also requires increasing memory locality, something that requires collating multiple sweeps into larger blocked sweeps.) Looking at the source code, it is templated (using C macros) to encourage the compiler to use just the right vectorization intrinsics.

Is rewriting a Julia package in C the solution for everyone? I think not.

Could this be a counter-example to the general consensus in the community that “Julia’s applicable context is getting narrower over time”? No. Re-writing in C combined with the BinaryBuilder infrastructure did not diminish the scope of the Julia package FastTransforms.jl.

Writing even a simple function in Julia is actually a meta-experience if the arguments are not type-annotated. This provides great ease of programming, and often this abstraction makes things work that were inconceived even by the programmer let alone the user, which is sometimes really, really awesome! In numerics, it is often desirable or even simply sufficient and acceptable to have a strongly typed solution for IEEE floating-point types, so this generality is not always necessary.

There is still room for improvement in Julia, though. Improving BigFloat is a long-standing issue because safe memory handling implies that Julia controls BigFloat memory while C controls mpfr_t memory and they cannot be mixed. This means that sizeof(BigFloat) != sizeof(FastTransforms.mpfr_t). I did find that implementing an upper-triangular matrix * dense matrix multiplication for MPFR types was significantly faster than the generic linear algebra routines in Julia. So when folks mention that the performance of C and Julia are nearly the same because the LLVM compiler is the same, this is a reasonable general statement that always comes with exceptions.

For reference, here is the source code that is the cause for this triangular matrix * dense matrix multiplication speedup (the times and disparity are so large that I don’t think there’s a need to use BenchmarkTools.jl):

julia> versioninfo()
Julia Version 1.5.1
Commit 697e782ab8 (2020-08-25 20:08 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Xeon(R) W-2191B CPU @ 2.30GHz
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake-avx512)

julia> using FastTransforms, LinearAlgebra

julia> n = 1024

julia> p = plan_leg2cheb(BigFloat, n);

julia> Id = Matrix{BigFloat}(I, n, n);

julia> @time p*Id; # C library FastTransforms version ccall'ed by the Julia wrapper
  3.583803 seconds (4.19 M allocations: 488.000 MiB, 3.58% gc time)

julia> P = UpperTriangular(p*Id);

julia> @time P*Id; # Generic linear algebra in Julia
123.838721 seconds (2.15 G allocations: 112.008 GiB, 20.71% gc time)


Hope this helps,



I agree that sometimes it is necessary to use explicit SIMD instructions (via intrinsics or similar) to get the full benefit, rather than relying on compiler vectorization. But you can do the same thing in Julia using SIMD.jl, no?

You can call MPFR directly from Julia, too. I agree that better automated memory management for the higher-level BigFloat type has been a longstanding wish-list item, but the ability to use low-level MPFR functions is not technically an exclusive advantage of C.

Definitely, a C library is still viewed as the most accessible “lowest-common denominator” dependency when you are coming from other languages.

Hopefully this is improving, e.g. diffeqpy has been pretty successful at making DifferentialEquations.jl accessible in Python via pyjulia. Most of the inter-language efforts thus far have focused mainly on calling other languages from Julia rather than the other way around, but this will change now that best-of-breed packages are becoming available in Julia that people want to access from other languages.


It would be a ton of work, but it would be really nice if Julia got a good native BigFloat library. GitHub - dzhang314/MultiFloats.jl: Fast extended-precision floating-point arithmetic for Julia is really good up to a few hundred bits, but we probably could be very competitive with MPFR/ARB by using some program generation.


I agree not enough people are thinking of Julia as supplying libraries that can be called from any language (C, C++, Fortran included). However here I put together demos of exactly this task, because we care about it, and they are not well documented yet. Enjoy:
GitHub - ahbarnett/multilingual-julia: Minimally complete examples of Julia calling, more importantly being called by, Fortran and C.


Kind of. We can definitely do better on the Python side. This video describes where we are at right now:

The diffeqr bindings are pretty amazing, as shown in the blog posts like:

The Python bindings aren’t able to be traced with ModelingToolkit though, which is something I need to figure out how to fix. I am trying to get @tkf interested in help out there. Also we don’t have automatic installation on the Python side yet either, which is a big hurdle to people adopting it as a dependency. Just a few small things I think are required and Julia might be the best way to make a Python library.


But most applications - scientific and engg. where speed counts - are replete with sparse matrices and sparse vectors.