Best way to convert/collect/promote into a concrete struct?

I have the struct below that I am trying to make more flexible:

struct PolyFit
    x::Vector{Float64}
    y::Vector{Float64}
    knots::Vector{Int64}
    segmentlengths::Vector{Float64}
    polys::Vector{Polynomial{Float64, :x}}
    RMSE::Float64
end

The performance tips indicate that one way to allow this would be:

struct PolyFit{T1<:AbstractVector,T2<:AbstractVector,T3<:AbstractVector,T4<:AbstractVector,T5<:AbstractVector}
    x::T1
    y::T2
    knots::T3
    segmentlenghts::T4
    polys::T5
    RMSE::Float64
end

but that makes for a pretty ugly composite type signature.

Is there a clean way to promote the element and container types at the same time without too much overhead, so I can use something like the first version? I can’t guarantee that any of the element types or container types will match, but they should promote fine. Vectors might be Ranges and Floats might be Ints. Would the overhead of converting the types be worth it for the faster type performance?

I was going to try:

PolyFit(x::AbstractVector, y::AbstractVector, knots::AbstractVector, segmentlengths::AbstractVector, polys::AbstractVector, RMSE) = PolyFit(collect.((x, y, knots, segmentlengths, polys))..., RMSE)

but that seems like a bad idea for performance and it doesn’t help with the element types.

1 Like

In your code and given the name which determines the purpose, x,y and others should likely have Vector type, but with more general element type, for example different types of numbers, the same for all fields. If this is the case the signature would be

struct PolyFit{T<:Number}
    x::Vector{T}
    y::Vector{T}
    knots::Vector{T}
    segmentlenghts::Vector{T}
    polys::Vector{T}
    RMSE::T
end

Your code allows for x, y and others to be completely different types of data, totally unrelated to each other. This would make results of operations on them unpredictable. Optionally you can also use AbstractVector{T} in the fields for more generality, however abstract types in the field signature generate unwanted performance costs.

EDIT: If you are asking strictly about making general constructor, I would just promote arguments to Vectors with the most general element type which you can get from promote_type.

I guess I am wondering where the best place is to put the promotion and the best way to write it.

For example:

x=0:10
struct MyStruct
    x::Vector{Float64}
end
function f(x)
    return MyStruct(x)
end
  1. I could convert inside each instance of the constructor, but I am afraid of the overhead that would cause if x is already the correct type.
function f(x)
    return MyStruct(collect(convert(Float64, x))
end
  1. I could add function methods which only convert if needed, but I have several of these functions and writing out all these methods is a lot of extra code.
f(x::Vector{Float64}) = MyStruct(x)
f(x::AbstractRange{Float64}) = f(collect(x))
f(x::Vector{<:Integer}) = f(convert(Float64, x))
f(x::AbstractRange{<:Integer}) = f(collect(convert(Float64, x))
  1. I can force the user to convert their own data before they call the function, but that seems overly restrictive.
f(collect(convert(Float64, x))

Multiple dispatch prompts you to be general, but I don’t know how to do that gracefully and quickly when I need to package results into containers.

Maybe something like this:

struct MyStruct{T<:Number}
    a::Vector{T}
    b::Vector{Polynomial{T}}
    c::T
end
function MyStruct(a, b, c)
    T = promote_type(eltype.((a, b))...)
    return MyStruct(T.(a), Polynomial{T}.(b), T(c))
end

I don’t know where the Polynomial comes from, so I can’t really test this. But it’s similar to

struct MyStruct2{S<:Number, T<:AbstractVector{S}}
    a::T
    b::T
    c::S
end
function MyStruct2(a::AbstractVector{R}, b::AbstractVector{S}, c::T) where {R,S,T}
    Q = promote_type(R, S, T)
    return MyStruct2(Q.(a), Q.(b), Q(c))
end

which is more permissive in the vector type:

julia> MyStruct2(1:3, 4:7, 11)
MyStruct2{Int64, UnitRange{Int64}}(1:3, 4:7, 11)

julia> MyStruct2(1:3, rand(Float32,2), 7//6)
MyStruct2{Float32, Vector{Float32}}(Float32[1.0, 2.0, 3.0], Float32[0.37313795, 0.015768051], 1.1666666f0)

Edit: If you only want to convert when necessary, you can make a converter that does nothing in appropriate cases:

_cv(::Type{S}, x::Vector{S}) where {S} = x  # do nothing if appropriate
_cv(::Type{S}, x) where {S} = convert(Vector{S}, x)
function MyStruct(a, b, c)
    Q = promote_type(eltype.((a, b)))
    return MyStruct(_cv(Q, a), _cv(Polynomial{Q}, b), Q(c))
end

(Note: still untested, but this approach worked on MyStruct2.)

Thanks. It is going to take me a bit to work through that. The Polynomial type is from the Polynomials package.

To make this even harder on myself, I am trying to use the StaticArrays package, but my struct keeps erroring with the unhelpful “invalid type signature”. I am trying to follow the Julia documentation:

Each where introduces a single type variable, so these expressions are nested for types with multiple parameters, for example Array{T,N} where N where T .

struct PolyFit{N1, N2, T} where N1 where N2 where T
    x::Vector{T}
    y::Vector{T}
    knots::SVector{N1, Int}
    segmentlengths::SVector{N2, T}
    polys::SizedVector{N2, Polynomial{T, :x}}
    RMSE::T
    PolyFit(x, y, knots, segmentlengths, polys, RMSE) = N1 - 1 != N2 ? error("vector size mismatch in PolyFit") : new{N1, N2, T}(x, y, knots, segmentlengths, polys, RMSE)
end

Am I allowed to construct an SArray with its size being passed as a function parameter. I saw a warning that the size must be known at compile time, but I don’t know exactly when that is.

function f(n)
    x = g(n) # vector with length n
    sa = SArray{n}(x) # make x static for better performance
    return sa # this would actually be collected into the struct before being returned
end