Got a bit further by installing this http://mingw-w64.org/doku.php/download
and making CLion use the binaries in the installed directories!
You seem to be having trouble creating a .dll more than accessing it with Julia. Your fastest path to solving this may be a forum devoted to the C varient/compiler that you are using.
I’m not overly familiar with the gcc compiler/linker but here is some info for you:
-
Producing a .dll as against an .exe is primarily a linker issue and not a compiler one.
-
A windows .dll needs to have a DllMain() function which will be called when the system loader loads the library and must at the very least return 1 for the .dll to load successfully.
-
The linker must also have a list of function names you wish to export, so it knows which names/eps to include in the IMAGE_EXPORT_DIRECTORY structure.
-
Using the M$ “Link.exe” linker this is accomplished by means of a .def file which is just a simple text file something like:
LIBRARY my_lib_name
EXPORTS
my_func_name1
my_func_name2
my_func_name3
Your C compiler/linker system will have some sort of method to provide the same information so your library can be successfully built.
- Once you have successfully created a library it should be relatively easy to access via:
ccall((:my_func_name,"MyDll.dll"),stdcall,ReturnType,(argtype1,argtype2),arg1,arg2)
- If necessary move the dll into your currnet directory or julia/bin driectory.
Hope this helps
After many trial and error I figured out a way to do this reliably now. I will blog about it soon so I don’t forget myself.
So I can defined a function charstarstar
that converts Julia’s representation of the String vector as C’s represenation by computing the start of the pointer location. Now it’s a just a question of efficiency, which I will soon. Thanks for the help and illuminating examples!
x = ["ba","cd"]
charstarstar(n) = reinterpret(Ptr{UInt8}, unsafe_load(reinterpret(Ptr{UInt},pointer(x)),n) + 8)
hello = ccall(
(:radix_sort,"libuntitled3"),
Void,
(Ptr{Ptr{UInt8}},UInt),
charstarstar.(1:2), length(x)
)
x
This normally happens when a dll has dependencies that are not in the path that Julia knows of. Some versions of MinGW add a dependency to a mingw’s internal lib (don’t recall which) like the cygwin1.dll introduced when compiling with gcc in Cygwin.
I tried your example with MSVC and it worked fine (that is, with no errors)
charstarstar.(1:2)
will create new vector. So your radix_sort will sort something unnamed and temporary.
When I show address arithmetic in Julia I didn’t mean that it has to happen there. I would rather try to prepare some example (Christmas present ) for you:
We could sort 100 millions of strings under 10s. Are we closer to R now?
This is result:
julia str_qsort.jl
before sort: String["Z", "Y", "X", "W", "V", "U", "T", "S", "R", "Q", "P", "O", "N", "M", "L", "K", "J", "I", "H", "G", "F", "E", "D", "C", "B", "A"]
488.482 ns (0 allocations: 0 bytes)
after sort: String["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"]
before sort: length=100000000
it is ok now to have unordered strings! svec1[1]>svec1[2] "WIA">"Tzuk="
9.186 s (0 allocations: 0 bytes)
after sort length=100000000
sorted! :)
This is str_qsort.jl
(I tested it with Julia 0.6.1 and also with 0.7.0-DEV):
using BenchmarkTools
C_code = raw"""
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int compare(const void* l, const void* r){
char* cl = *(char**)l+8;
char* cr = *(char**)r+8;
return strcmp(cl, cr);
}
void str_qsort(char **strings, size_t len) { /* you need to pass len here */
/* qsort(strings, len, sizeof(char*), (int (*)(void*, void*))strcmp); */
qsort(strings, len, sizeof(char*), compare);
}
"""
const Clib = "strqsort" # tempname() # make a temporary file
# compile to a shared library by piping C_code to gcc
# (works only if you have gcc installed):
open(`gcc -fPIC -O3 -msse3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
print(f, C_code)
end
# define a Julia function that calls the C function:
str_qsort(X::Array{String}) = ccall(("str_qsort", Clib), Void, (Ptr{UInt64}, Cint), reinterpret(Ptr{UInt64}, pointer(X)), length(X))
a = [String([Char(i)]) for i in 'Z':-1:'A']
println("before sort: $a")
@btime str_qsort(a)
println("after sort: $a")
const M=100_000_000; const K=100
srand(1)
svec1 = rand([string(rand(Char.(32:126), rand(1:8))...) for k in 1:M÷K], M)
l = length(svec1)
println("before sort: length=$l")
for i in 1:length(svec1)-1
if svec1[i]>svec1[i+1]
println("it is ok now to have unordered strings! svec1[$i]>svec1[$(i+1)] \"$(svec1[i])\">\"$(svec1[i+1])\"")
break
end
end
@btime str_qsort(svec1)
l = length(svec1)
println("after sort length=$l")
global sorted = true
for i in 1:length(svec1)-1
if svec1[i]>svec1[i+1]
println("error svec1[$i]>svec1[$(i+1)] $(svec1[i])>$(svec1[i+1])")
sorted = false
break
end
end
println(sorted ? "sorted! :)" : "unsorted! :(")
yeah i realised after sorting in c but in Julia the string didn’t change. The radixsort implementation doesnt use charstarstar
.
good example …
const Clib = "./strqsort"
worked better for me on Linux
should the above be @btime str_qsort($svec1)
instead? As svec1
gets modified after first run. Also str_qsort
should be called str_qsort!
. Also on my computer that sort takes 48 seconds but FastGroupBy’s radixsort!
only took 19~29 seconds.
const M=100_000_000; const K=100
using FastGroupBy
srand(1)
svec1 = rand([string(rand(Char.(32:126), rand(1:8))...) for k in 1:M÷K], M)
@time FastGroupBy.radixsort!(svec1) # 29 seconds
issorted(svec1)
My cradixsort!
performs poorly atm at 39~50 seconds.
Thanks! I didn’t know about $
and you are right about !
too.
I just cloned your FastGroupBy package ( Pkg.clone("https://github.com/xiaodaigh/FastGroupBy.jl.git")
) …
Benchmarks are complicated things because my results are 6.242 s (29 allocations: 2.24 GiB)
for radixsort!
and 8.997 s (0 allocations: 0 bytes)
for str_qsort!
.
Seems that our HW/SW configurations are different so are our benchmarks too…
But anyway - good result for radixsort!
!
i need a benchmark suite that collect other info. i wonder what set up you have? i have a core i7 laptop with 4 cores and 32G of RAM
core i5 8GB RAM Ubuntu16.04 64 bit
it is really interesting that my results are so better than yours! (windows vs ubuntu doesn’t seems to be cause or…?)
I hope we found here some more volunteers!
Thanks! On my ubuntu it works without ./
and put strqsort.so
locally…
I was using \path_to_julia(0.6/0.7)\julia str_qsort.jl
…
Could you please do benchmark on your machine? If it possible then with xiaodai’s radixsort!
too? Thanks!
linux flavor : Linux Mint 18.2 Sonya
processor : Intel Quad Core i7-6820HK
clock freqs : 2700 / 3600 MHz (base/max)
caches: 32K, 256K, 8192K
memory: 64G
How is it possible that my worse HW could get better results? Did you benchmark with @time
or @btime
? on global namespace?
I was trying to get similar results and below are my two experiments.
@time
→ 14.436537 seconds (33 allocations: 2.238 GiB, 19.33% gc time)
@time
on sorted array → 5.801548 seconds (33 allocations: 2.238 GiB, 5.04% gc time)
@btime
→ 5.601 s (29 allocations: 2.24 GiB)
Do I still use BenchmarkTool wrong?
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _' | |
| | |_| | | | (_| | | Version 0.6.1 (2017-10-24 22:15 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-pc-linux-gnu
julia> using FastGroupBy
julia> const M = Int(1e8); const K = 100;
julia> srand(1);
julia> svec1 = rand(["i"*dec(k,7) for k in 1:M÷K], M);
julia> @time radixsort!(["a"])
0.414596 seconds (148.40 k allocations: 8.834 MiB)
julia> @time radixsort!(["a"])
0.000210 seconds (11 allocations: 1.001 MiB)
julia> @time radixsort!(svec1)
14.436537 seconds (33 allocations: 2.238 GiB, 19.33% gc time)
julia> @time radixsort!(svec1) # now svec1 is sorted
5.801548 seconds (33 allocations: 2.238 GiB, 5.04% gc time)
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _' | |
| | |_| | | | (_| | | Version 0.6.1 (2017-10-24 22:15 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-pc-linux-gnu
julia> using FastGroupBy
julia> using BenchmarkTools
julia> srand(1);
julia> const M = Int(1e8); const K = 100;
julia> svec1 = rand(["i"*dec(k,7) for k in 1:M÷K], M);
julia> @btime radixsort!($svec1)
5.601 s (29 allocations: 2.24 GiB)
I was measuring apples with an orange ruler and
julia> Pkg.status("BenchmarkTools")
- BenchmarkTools 0.2.2
julia> versioninfo()
julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-6820HK CPU @ 2.70GHz
WORD_SIZE: 64
...
LLVM: libLLVM-3.9.1 (ORCJIT, skylake)
run this on v0.6.2:
using BenchmarkTools
using FastGroupBy
C_code = raw"""
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int compare(const void* l, const void* r){
char* cl = *(char**)l+8;
char* cr = *(char**)r+8;
return strcmp(cl, cr);
}
void str_qsort(char **strings, size_t len) { /* you need to pass len here */
/* qsort(strings, len, sizeof(char*), (int (*)(void*, void*))strcmp); */
qsort(strings, len, sizeof(char*), compare);
}
""";
const Clib = "./strqsort" # tempname() # make a temporary file
# compile to a shared library by piping C_code to gcc
# (works only if you have gcc installed):
open(`gcc -fPIC -O3 -msse3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
print(f, C_code)
end
# define a Julia function that calls the C function:
str_qsort(X::Array{String}) =
ccall(("str_qsort", Clib), Void, (Ptr{UInt64}, Cint), reinterpret(Ptr{UInt64}, pointer(X)), length(X))
# !! M is 10 million !!
const M=10_000_000; const K=100;
srand(1);
svec1 = rand([string(rand(Char.(32:126), rand(1:8))...) for k in 1:M÷K], M);
svec2 = copy(svec1);
svec3 = copy(svec1);
fastest_qsort = (@benchmark str_qsort($svec1)).times[1];
fastest_sort = (@benchmark sort!($svec2)).times[1];
fastest_radixsort = (@benchmark FastGroupBy.radixsort!($svec3)).times[1];
qsort_secs = round(fastest_qsort / 1e9, 4);
sort_secs = round(fastest_sort / 1e9, 4);
rsort_secs = round(fastest_radixsort / 1e9, 4);
issorted(svec2) && println("sort seconds: $sort_secs");
issorted(svec1) && println("qsort seconds: $qsort_secs");
issorted(svec3) && println("rsort seconds: $rsort_secs");
I see
sort seconds: 0.8236
qsort seconds: 0.5173
rsort seconds: 0.399
restarting and running the same code with M = 100_000_000
:
sort seconds: 10.4221
qsort seconds: 6.2238
rsort seconds: 4.1681
What do you see on your machine?
I see very similar timing and that your computer is a abit faster than mine but the order of the sorts is the same.
Right.
Assume that others see much the same when running this on distinct technological platforms. With that [presumed] additional evidence, we may hypothesize [and use Julia to fit] a loosely affine invariance that transforms timings for each of these algorithms on one platform to another. Better still, we do not need any explicit statement of these relationships – just consistent and non-contradictory evidence.
Given, after a fashion, all of the above, the algorithms’ relative timings inform.
Just wrote a simple guide for Windows users
Fantastic!
But be aware! LD_LIBRARY_PATH
could represent more paths! (on unix systems :
is separator)
env_path = ENV["LD_LIBRARY_PATH"]
clibname = "addc"
const Clib = joinpath(env_path,clibname) # WARNING!!!
BTW I am curious if we could do something better with calling LLVM inline assembly (see for example Andrew Cooke: C[omp]ute ).
For start I am thinking about radix calculation. It could be good if we could avoid function calling overhead (although I am not sure if Base.llvmcall
could help).