How to make a C function compiled by myself available to `ccall`?

Got a bit further by installing this http://mingw-w64.org/doku.php/download and making CLion use the binaries in the installed directories!

You seem to be having trouble creating a .dll more than accessing it with Julia. Your fastest path to solving this may be a forum devoted to the C varient/compiler that you are using.

I’m not overly familiar with the gcc compiler/linker but here is some info for you:

  • Producing a .dll as against an .exe is primarily a linker issue and not a compiler one.

  • A windows .dll needs to have a DllMain() function which will be called when the system loader loads the library and must at the very least return 1 for the .dll to load successfully.

  • The linker must also have a list of function names you wish to export, so it knows which names/eps to include in the IMAGE_EXPORT_DIRECTORY structure.

  • Using the M$ “Link.exe” linker this is accomplished by means of a .def file which is just a simple text file something like:

LIBRARY my_lib_name
EXPORTS
my_func_name1
my_func_name2
my_func_name3

Your C compiler/linker system will have some sort of method to provide the same information so your library can be successfully built.

  • Once you have successfully created a library it should be relatively easy to access via:
ccall((:my_func_name,"MyDll.dll"),stdcall,ReturnType,(argtype1,argtype2),arg1,arg2)
  • If necessary move the dll into your currnet directory or julia/bin driectory.

Hope this helps :slight_smile:

1 Like

After many trial and error I figured out a way to do this reliably now. I will blog about it soon so I don’t forget myself.

2 Likes

So I can defined a function charstarstar that converts Julia’s representation of the String vector as C’s represenation by computing the start of the pointer location. Now it’s a just a question of efficiency, which I will soon. Thanks for the help and illuminating examples!

x = ["ba","cd"]
charstarstar(n) = reinterpret(Ptr{UInt8}, unsafe_load(reinterpret(Ptr{UInt},pointer(x)),n) + 8)
hello = ccall(
    (:radix_sort,"libuntitled3"),
     Void, 
     (Ptr{Ptr{UInt8}},UInt),
    charstarstar.(1:2), length(x)
)
x

This normally happens when a dll has dependencies that are not in the path that Julia knows of. Some versions of MinGW add a dependency to a mingw’s internal lib (don’t recall which) like the cygwin1.dll introduced when compiling with gcc in Cygwin.
I tried your example with MSVC and it worked fine (that is, with no errors)

charstarstar.(1:2) will create new vector. So your radix_sort will sort something unnamed and temporary.

When I show address arithmetic in Julia I didn’t mean that it has to happen there. I would rather try to prepare some example (Christmas present :wink: ) for you:

We could sort 100 millions of strings under 10s. Are we closer to R now?

This is result:

julia str_qsort.jl 
before sort: String["Z", "Y", "X", "W", "V", "U", "T", "S", "R", "Q", "P", "O", "N", "M", "L", "K", "J", "I", "H", "G", "F", "E", "D", "C", "B", "A"]
  488.482 ns (0 allocations: 0 bytes)
after sort: String["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"]
before sort: length=100000000
it is ok now to have unordered strings! svec1[1]>svec1[2] "WIA">"Tzuk="
  9.186 s (0 allocations: 0 bytes)
after sort length=100000000
sorted! :)

This is str_qsort.jl (I tested it with Julia 0.6.1 and also with 0.7.0-DEV):

using BenchmarkTools

C_code = raw"""
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int compare(const void* l, const void* r){
    char* cl = *(char**)l+8;
    char* cr = *(char**)r+8;
    return strcmp(cl, cr);
}

void str_qsort(char **strings, size_t len) {  /* you need to pass len here */
    
    /* qsort(strings, len, sizeof(char*), (int (*)(void*, void*))strcmp); */
    qsort(strings, len, sizeof(char*), compare);
    
}   
"""

const Clib = "strqsort" # tempname()   # make a temporary file


# compile to a shared library by piping C_code to gcc
# (works only if you have gcc installed):

open(`gcc -fPIC -O3 -msse3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
    print(f, C_code) 
end

# define a Julia function that calls the C function:
str_qsort(X::Array{String}) = ccall(("str_qsort", Clib), Void, (Ptr{UInt64}, Cint), reinterpret(Ptr{UInt64}, pointer(X)), length(X))

a = [String([Char(i)]) for i in 'Z':-1:'A']
println("before sort: $a")
@btime str_qsort(a)
println("after sort: $a")

const M=100_000_000; const K=100
srand(1)
svec1 = rand([string(rand(Char.(32:126), rand(1:8))...) for k in 1:M÷K], M)
l = length(svec1)
println("before sort: length=$l")
for i in 1:length(svec1)-1 
   if svec1[i]>svec1[i+1]
      println("it is ok now to have unordered strings! svec1[$i]>svec1[$(i+1)] \"$(svec1[i])\">\"$(svec1[i+1])\"")
      break
   end
end

@btime str_qsort(svec1)

l = length(svec1)
println("after sort length=$l")
global sorted = true
for i in 1:length(svec1)-1 
   if svec1[i]>svec1[i+1]
      println("error svec1[$i]>svec1[$(i+1)] $(svec1[i])>$(svec1[i+1])")
      sorted = false
      break
   end
end
println(sorted ? "sorted! :)" : "unsorted! :(")

yeah i realised after sorting in c but in Julia the string didn’t change. The radixsort implementation doesnt use charstarstar.

good example …
const Clib = "./strqsort" worked better for me on Linux

should the above be @btime str_qsort($svec1) instead? As svec1 gets modified after first run. Also str_qsort should be called str_qsort!. Also on my computer that sort takes 48 seconds but FastGroupBy’s radixsort! only took 19~29 seconds.

const M=100_000_000; const K=100
using FastGroupBy
srand(1)
svec1 = rand([string(rand(Char.(32:126), rand(1:8))...) for k in 1:M÷K], M)
@time FastGroupBy.radixsort!(svec1) # 29 seconds
issorted(svec1)

My cradixsort! performs poorly atm at 39~50 seconds.

Thanks! :slight_smile: I didn’t know about $ and you are right about ! too.

I just cloned your FastGroupBy package ( Pkg.clone("https://github.com/xiaodaigh/FastGroupBy.jl.git")) …

Benchmarks are complicated things because my results are 6.242 s (29 allocations: 2.24 GiB) for radixsort! and 8.997 s (0 allocations: 0 bytes) for str_qsort!.

Seems that our HW/SW configurations are different so are our benchmarks too…

But anyway - good result for radixsort!! :smiley:

i need a benchmark suite that collect other info. i wonder what set up you have? i have a core i7 laptop with 4 cores and 32G of RAM

core i5 8GB RAM Ubuntu16.04 64 bit

it is really interesting that my results are so better than yours! (windows vs ubuntu doesn’t seems to be cause or…?)

I hope we found here some more volunteers! :stuck_out_tongue:

Thanks! :slight_smile: On my ubuntu it works without ./ and put strqsort.so locally…

I was using \path_to_julia(0.6/0.7)\julia str_qsort.jl

Could you please do benchmark on your machine? If it possible then with xiaodai’s radixsort! too? Thanks! :slight_smile:

linux flavor : Linux Mint 18.2 Sonya
processor : Intel Quad Core i7-6820HK
clock freqs : 2700 / 3600 MHz (base/max)
caches: 32K, 256K, 8192K
memory: 64G

1 Like

How is it possible that my worse HW could get better results? Did you benchmark with @time or @btime? on global namespace?

I was trying to get similar results and below are my two experiments.
@time14.436537 seconds (33 allocations: 2.238 GiB, 19.33% gc time)
@time on sorted array → 5.801548 seconds (33 allocations: 2.238 GiB, 5.04% gc time)

@btime5.601 s (29 allocations: 2.24 GiB)

Do I still use BenchmarkTool wrong?

              _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _' |  |
  | | |_| | | | (_| |  |  Version 0.6.1 (2017-10-24 22:15 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-pc-linux-gnu

julia> using FastGroupBy
julia> const M = Int(1e8); const K = 100;
julia> srand(1);
julia> svec1 = rand(["i"*dec(k,7) for k in 1:M÷K], M);
julia> @time radixsort!(["a"])
  0.414596 seconds (148.40 k allocations: 8.834 MiB)

julia> @time radixsort!(["a"])
  0.000210 seconds (11 allocations: 1.001 MiB)

julia> @time radixsort!(svec1)
 14.436537 seconds (33 allocations: 2.238 GiB, 19.33% gc time)

julia> @time radixsort!(svec1)  # now svec1 is sorted
  5.801548 seconds (33 allocations: 2.238 GiB, 5.04% gc time)

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _' |  |
  | | |_| | | | (_| |  |  Version 0.6.1 (2017-10-24 22:15 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-pc-linux-gnu

julia> using FastGroupBy

julia> using BenchmarkTools

julia> srand(1);

julia> const M = Int(1e8); const K = 100;

julia> svec1 = rand(["i"*dec(k,7) for k in 1:M÷K], M);

julia> @btime radixsort!($svec1) 
  5.601 s (29 allocations: 2.24 GiB)

I was measuring apples with an orange ruler and

julia> Pkg.status("BenchmarkTools")
 - BenchmarkTools                0.2.2

julia> versioninfo()
julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6820HK CPU @ 2.70GHz
  WORD_SIZE: 64
  ...
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)

run this on v0.6.2:

using BenchmarkTools
using FastGroupBy

C_code = raw"""
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int compare(const void* l, const void* r){
    char* cl = *(char**)l+8;
    char* cr = *(char**)r+8;
    return strcmp(cl, cr);
}

void str_qsort(char **strings, size_t len) {  /* you need to pass len here */
    
    /* qsort(strings, len, sizeof(char*), (int (*)(void*, void*))strcmp); */
    qsort(strings, len, sizeof(char*), compare);
    
}   
""";

const Clib = "./strqsort" # tempname()   # make a temporary file


# compile to a shared library by piping C_code to gcc
# (works only if you have gcc installed):

open(`gcc -fPIC -O3 -msse3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
    print(f, C_code)
end

# define a Julia function that calls the C function:
str_qsort(X::Array{String}) = 
    ccall(("str_qsort", Clib), Void, (Ptr{UInt64}, Cint), reinterpret(Ptr{UInt64}, pointer(X)), length(X))

# !! M is 10 million !!
const M=10_000_000; const K=100;
srand(1);
svec1 = rand([string(rand(Char.(32:126), rand(1:8))...) for k in 1:M÷K], M);
svec2 = copy(svec1);
svec3 = copy(svec1);


fastest_qsort     = (@benchmark str_qsort($svec1)).times[1];
fastest_sort      = (@benchmark sort!($svec2)).times[1];
fastest_radixsort = (@benchmark FastGroupBy.radixsort!($svec3)).times[1];

qsort_secs = round(fastest_qsort / 1e9, 4);
sort_secs  = round(fastest_sort  / 1e9, 4);
rsort_secs = round(fastest_radixsort / 1e9, 4);

issorted(svec2) && println("sort  seconds: $sort_secs"); 
issorted(svec1) && println("qsort seconds: $qsort_secs"); 
issorted(svec3) && println("rsort seconds: $rsort_secs"); 

I see



sort  seconds: 0.8236

qsort seconds: 0.5173

rsort seconds: 0.399

restarting and running the same code with M = 100_000_000:



sort  seconds: 10.4221

qsort seconds: 6.2238

rsort seconds: 4.1681

What do you see on your machine?

1 Like

I see very similar timing and that your computer is a abit faster than mine but the order of the sorts is the same.

Right.

Assume that others see much the same when running this on distinct technological platforms. With that [presumed] additional evidence, we may hypothesize [and use Julia to fit] a loosely affine invariance that transforms timings for each of these algorithms on one platform to another. Better still, we do not need any explicit statement of these relationships – just consistent and non-contradictory evidence.

Given, after a fashion, all of the above, the algorithms’ relative timings inform.

Just wrote a simple guide for Windows users

https://www.codementor.io/zhuojiadai/making-compiling-c-functions-for-use-in-julia-on-windows-f9lwa5i43

6 Likes

Fantastic! :slight_smile:

But be aware! LD_LIBRARY_PATH could represent more paths! (on unix systems : is separator)

env_path = ENV["LD_LIBRARY_PATH"]
clibname = "addc"
const Clib = joinpath(env_path,clibname)  # WARNING!!!

BTW I am curious if we could do something better with calling LLVM inline assembly (see for example Andrew Cooke: C[omp]ute ).

For start I am thinking about radix calculation. It could be good if we could avoid function calling overhead (although I am not sure if Base.llvmcall could help).

1 Like