Hi! How one can read files in the libsvm
format in Julia? The LIBSVM.jl
package seems to implement only algorithms. My current solution would be to read the file in python, save it in other format and then load it into Julia, but that’s a bit of an overkill for such a basic task.
1 Like
Following is a naive implementation I’ve made to read libsvm format.
It’s not high performance but proved suitable for my needs: parsing the Yahoo Laerning to Rank Challenge Set1 train data takes about 2 mins (~ 475 000 observations, 700 columns).
Note that It returns a dense matrix, not a sparse one:
function read_libsvm(raw::Vector{UInt8}; has_query=false)
io = IOBuffer(raw)
lines = readlines(io)
nobs = length(lines)
nfeats = 0 # number of features
y = zeros(Float64, nobs)
if has_query
offset = 2 # offset for feature idx: y + query entries
q = zeros(Int, nobs)
else
offset = 1 # offset for feature idx: y
end
vals = [Float64[] for _ in 1:nobs]
feats = [Int[] for _ in 1:nobs]
for i in eachindex(lines)
line = lines[i]
line_split = split(line, " ")
y[i] = parse(Int, line_split[1])
has_query ? q[i] = parse(Int, split(line_split[2], ":")[2]) : nothing
n = length(line_split) - offset
lfeats = zeros(Int, n)
lvals = zeros(Float64, n)
@inbounds for jdx in 1:n
ls = split(line_split[jdx+offset], ":")
lvals[jdx] = parse(Float64, ls[2])
lfeats[jdx] = parse(Int, ls[1])
lfeats[jdx] > nfeats ? nfeats = lfeats[jdx] : nothing
end
vals[i] = lvals
feats[i] = lfeats
end
x = zeros(Float64, nobs, nfeats)
@inbounds for i in 1:nobs
@inbounds for jdx in 1:length(feats[i])
j = feats[i][jdx]
val = vals[i][jdx]
x[i, j] = val
end
end
if has_query
return (x=x, y=y, q=q)
else
return (x=x, y=y)
end
end
1 Like
Thanks! So does it mean that there is indeed no dedicated package for this?
Correct, at least I’m not aware of any dedicated package.
I think that the above function could be polished a little and added into LIBSVM, or even adapted into a dedicated lightweight package within JuliaIO · GitHub.