using DataFrames
function Silouhette(
input_data,
model::KmeansResult
)::Float64
if Tables.istable(input_data)
RunSilouhette(Tables.matrix(input_data), model)
else
RunSilouhette(input_data, model)
end
end
The result of @code_warntype for this function and the following input is:
One way can be using Matrix{Float64}(input_data) rather than Tables.matrix(input_data):
function Silouhette(
input_data,
model::KmeansResult
)::Float64
if Tables.istable(input_data)
RunSilouhette(Matrix{Float64}(input_data), model)
else
RunSilouhette(input_data, model)
end
end
The RunSilouhette function calculates the silhouette coefficient of the clustering, and it should get the data as an object of subtype AbstractMatrix. So dispatching won’t help unless I change the content of RunSilouhette. Since that function is a bit lengthy, copying it for two versions of input data wouldn’t be a good idea (in that case, it will take much space in the script.). Please correct me if you think I’m wrong. Thank you!
One way can be using Matrix{Float64}(input_data) rather than Tables.matrix(input_data):
The docs say
help?> Tables.matrix
Tables.matrix(table; transpose::Bool=false)
Materialize any table source input as a new Matrix or in the case of a MatrixTable return the originally
wrapped matrix. If the table column element types are not homogenous, they will be promoted to a common type
in the materialized Matrix. Note that column names are ignored in the conversion. By default, input table
columns will be materialized as corresponding matrix columns; passing transpose=true will transpose the
input with input columns as matrix rows or in the case of a MatrixTable apply permutedims to the originally
wrapped matrix.
I think the instability comes from the fact that Tables.matrix does automatic type promotion and so its return type is not immediately inferable from its arguments.
The RunSilouhette function calculates the silhouette coefficient of the clustering, and it should get the data as an object of subtype AbstractMatrix. So dispatching won’t help unless I change the content of RunSilouhette.
The idea was to contain the type stability within a function barrier, which would allow you to make Silouhette type stable:
using DataFrames
function RunSilouhette(input::AbstractMatrix, model)::Float64
return sum(input[:])
end
function RunSilouhette(input, model)
println("yes, we are being called")
return RunSilouhette(Tables.matrix(input), model)
end
function Silouhette(
input_data,
model
)
return RunSilouhette(input_data, model)
end
model = :some_model
df = DataFrame(:a => randn(5), :b => randn(5))
And this gives
julia> @code_warntype Silouhette(df, model)
MethodInstance for Silouhette(::DataFrame, ::Symbol)
from Silouhette(input_data, model) in Main at /home/.../mwe.jl:14
Arguments
#self#::Core.Const(Silouhette)
input_data::DataFrame
model::Symbol
Body::Float64
1 ─ %1 = Main.RunSilouhette(input_data, model)::Float64
└── return %1
julia> Silouhette(df, model)
yes, we are being called
-0.8957902425154177
So this does not magically remove the type instability, it just hides it somewhere such that you can then write the Silouhette code in a type stable manner.
What do you mean by no-op? I didn’t understand.
No operation:The above @code_warntype output shows that it does not do ‘real’ work and only calls RunSilouhette. It would be better do add some more code to Silouhette, otherwise one might ask about why to even make it type stable