I’m trying to call the libsais_int function from libsais, with signature: int32_t libsais_int(int32_t * T, int32_t * SA, int32_t n, int32_t k, int32_t fs);
I wrote the following (and also tried using pointer() but I cannot seem to get it to work:
const LIBSAIS = "libsais.so.2"
function create_suffix_array(in_vector::Vector{Int32}, free_space::Int)
out_vector = zeros(Int32, length(in_vector) + free_space)
n = length(in_vector)
k = length(Set(in_vector))
res = ccall((:libsais_int,LIBSAIS), Cint, (Ref{Vector{Int32}}, Ref{Vector{Int32}}, Cint, Cint, Cint), in_vector, out_vector, n, k, free_space)
return out_vector
end
println(create_suffix_array(Int32[2,1,1,1,10], 10000))
function create_suffix_array(in_vector::Vector{Int32}, free_space::Int)
out_vector = zeros(Int32, length(in_vector) + free_space)
GC.@preserve in_vector out_vector begin
n = length(in_vector)
k = length(Set(in_vector))
res = ccall((:libsais_int,LIBSAIS), Cint, (Ref{Vector{Int32}}, Ref{Vector{Int32}}, Cint, Cint, Cint), Ref(in_vector), Ref(out_vector), n, k, free_space)
end
return out_vector
end
That also segfaults, or should I put the GC preserve there in a different way?
That seems to work, I do get a segfault with other input data but maybe that relates to the the library (altho it should be possible it says in the docs):
const LIBSAIS = "/home/rickb/tools/suffix/libsais/libsais.so.2"
function create_suffix_array(in_vector::Vector{Int32}, free_space::Int)
out_vector = zeros(Int32, length(in_vector) + free_space)
n = length(in_vector)
k = length(Set(in_vector))
@ccall LIBSAIS.libsais_int(in_vector::Ptr{Int32}, out_vector::Ptr{Int32}, n::Int32, k::Int32, free_space::Int32)::Int32
return out_vector
end
println(create_suffix_array(Int32[2,1,1,1,1458521], 10)) # works
println(create_suffix_array(Int32[2,1,1,1,1458521], 1000)) # segfaults
* @param fs Extra space available at the end of SA array (can be 0, but 4k or better 6k is recommended for optimal performance).
As the parameter is the alphabet size (which includes missing letters). So essentially, the maximum value in the input array.
Specifically, if input is [1,5,2,2] is should be at least 5+1 = 6 and not 3 which is the length(Set([1,5,2,2])). The +1 on the maximum is because 0 is a an alphabet value too.
As, the maximum value in the example is rather big, it might be productive to first reduce the maximum alphabet value with a lookup table as follows:
v = Int32[2,1,1,1,1458521]
d = Dict(Iterators.map(reverse,pairs(sort(unique(v)))))
newv = Int32.(get.(Ref(d), v, 0))
Now, the maximum(newv)+1 should be minimal for the input vector, and ready to be shipped to libsais_int. Since the ordering of inputs are preserved, the output suffix array is also identical.