Hi guys, I am new to Julia’s Discourse, but rather familiar with Julia. I maintain a package called Lathe.jl which is an object-oriented machine-learning package in Julia with syntax similar to SkLearn. As such, for an example my syntax would usually go like this, this is the ordinal encoder, as an example:
function OrdinalEncoder(array)
uni = Set(array)
lookup = Dict()
[push!(lookup, (value => i)) for (i, value) in enumerate(uni)]
predict(arr) = [row = lookup[row] for row in arr]
()->(predict;lookup)
end
The issue I am running into is that with these unsafe boxed types. I am able to have sklearn-like syntax, and it is faster than dispatch by around half, and I get an initialization function. The only problem is that these types are compound creations and very abstract on definition. They are of the type var(#1515, #1314), etc. Not really the most traceable types. My intention would be to combine this with modern Julian functional programming and create a supertype hierarchy:
abstract type h end
function hello(test)
() -> (test) <: h
end
That actually ran, but then running that type under T <: h, I get a return of false. Of course this means it also doesn’t work with dispatch. The only significant way to get around this would be to put this type into a struct and call everything as children of both of the types. Obviously, I don’t want to do that…
Thanks for the help, obviously this a unique issue. I was thinking maybe there could be a way I could wrap it directly as children of that struct maybe, probably through dispatching a direct import of <:. Let me know what you guys think!
I’m sorry, but it is not entirely clear to me what your goals are here. Maybe you could try to reword what your expected outcome and behavior is and we’ll certainly find a way.
Anonymous functions can obviously not be subtypes of a newly defines abstract type, but is this really necessary?
If you just need or want the object.method() syntax to work, you can easily define new methods for getproperty which is implicitly called by struct.field.
You may be abled to wrap your returns into appropriate types while maintaining the apparent behavior that you now have, but it’d be nice to have a clearer example to judge that.
IIRC <: is one of the rare build-in functions that you can not create new methods for.
Could you please give more concrete examples of what it is you aim to achieve and also what you mean by “faster than dispatch”?
If you want “instance methods” in Julia, you can easily do that by having a struct with function fields, e.g.:
struct OrdinalEncoder{V,D,P}
predict::P
lookup::D
function OrdinalEncoder(array)
lookup = Dict(v => i for (i,v) in array |> unique |> enumerate)
predict(arr) = map(x->lookup[x], arr)
V, D, P = keytype(lookup), typeof(lookup), typeof(predict)
return new{V,D,P}(predict, lookup)
end
end
(keep in mind that it is not what function fields are usually used for).
Actually, your approach should be optimal. I am referring to how using these methods as children instead of dispatching types for example with the ordinal encoder:
encoder = OrdinalEncoder(y)
encoded_data = OrdinalEncoder.predict(y)
Compared to
struct encoder
Etc…
end
predict(encoder)
But the code you presented could be helpful for compilation, as well as what I need, but I will have to test it. I will reply back!
If you want your stuff to be legible, you should use real structs.
Furthermore, it’s not boxing that’s causing performance problems, but things like the useless creation of a Set, the un-necessary allocation of an array, and the creation of a Dict{Any, Any} which could be made concrete.
I’d write your OrdinalEncoder like this (if I was deadset on using this hack for OOP):
function OrdinalEncoder(array)
lookup = Dict{eltype(array), Int}()
for (i, value) ∈ enumerate(array)
if value ∉ values(lookup)
lookup[value] = i
end
end
predict(arr) = [row = lookup[row] for row in arr]
()->(predict;lookup)
end