# Functions with static table-like inputs and outputs

I often have to write functions that recalculate some arrays of structures (sorted by one of the fields) into other arrays of structures. I often had to do conversions from one array of structures into another to fit argument type. So I wanted to minimize this boilerplate code.

In previous version I passed every column into a separate argument, but then there were no guarantees that they have the same length. So here’s a template of such a function that I eventually came up with.

``````using StructArrays

"""
function finds intervals of matching symbols from sorted array of symbols
"""
function find_intervals(
inp::AbstractVector{@NamedTuple{time::Int, sym::Symbol}};
target_syms::Vector{Symbol} = [:A, :B],
break_syms::Vector{Symbol} = [:C, :D],
)
# or StructVector{@NamedTuple{...}}[]
out = Vector{@NamedTuple{tbeg::Int, tend::Int, count::Int, type::Symbol}}()

count2sym = count->count < 2 ? :short : :long
is_series = false
tbeg = 1
tend = 1
count = 0
for x in inp
if x.sym in target_syms
if ~is_series
tbeg = x.time
is_series = true
count = 0
end
count += 1
elseif x.sym in break_syms
if is_series
push!(out, (; tbeg, tend, count, type = count2sym(count)))
is_series = false
end
else
# other symbols don't break series and are not counted
end
tend = x.time
end
if is_series
push!(out, (; tbeg, tend, count, type = count2sym(count)))
is_series = false
end

return out
end

times = [10,20,30,40,50,60,70,80]
syms = [:A,:B,:C,:D,:A,:B,:C,:D]

# Problem 1: I have data either in columns or rows table that should both work:
cols = (time = times, sym = syms)
rows = [(time = t, sym = s) for (t, s) in zip(times, syms)]
# rows - can pass directly:
out = find_intervals(rows, target_syms=[:A, :C], break_syms=[:B])
# cols - wrap into a struct vector:
sv = StructVector(cols)
out = find_intervals(sv, target_syms=[:A, :C], break_syms=[:B])

# Problem 2: Column names and number do not fit with function signature:
cols_ = (T = times, T2 = 2 .* times, S = syms)
rows_ = [(T = t, T2 = 2t, S = s) for (t, s) in zip(times, syms)]
# rows - should copy into another rows with renamed fields (-)
selected = NamedTuple{(:T, :S)}(r)
renamed = NamedTuple{(:time, :sym)}(values(selected))
end
out = find_intervals(cols, target_syms=[:A, :C], break_syms=[:B])
# cols - should select and rename columns, wrap into another struct vector (+no copy)
cols = StructVector((time = cols_.T, sym = cols_.S))
out = find_intervals(cols, target_syms=[:A, :C], break_syms=[:B])
``````

What confuses me is that field renaming is redundant here. When I use NamedTuple in signature, it fixes both names and order of arguments. This seems redundant, because it usually happens at substitution of positional arguments into the function. Is there a way to declare local column arguments names inside function and check only their types?

Indeed, StructArrays are great for tables among other usecases – much better than juggling separate arrays.

Why do you want to put colnames into function signature at all? This not only constrains the field order, but also doesn’t allow adding new columns later, or to use structs other than namedtuples.

For mismatched names, there are two solutions:

• rename them similar to what you do, but simpler:
``````(time=r.T, sym=r.S)
# or
NamedTuple{(:time, :sym)}(values(r[(:T, :S)]))
``````
• pass small accessor functions defining each element:
``````function f(ftime, fsym, tbl)
...
for r in tbl
...
push!(result, @set ftime(r) = newtime)  # can set ftime(r) if using Accessors
...
end

f(x->x.time, tbl)
f(x->x.T, tbl)
using Accessors
f(@optic(_.T), tbl)
``````

The latter is more flexible, you can do stuff like `f(x->x.T - 2000, tbl)` if some transformation is needed.

Minor: note that you can use `similar(inp, neweltype)` for generality.

I don’t, but names already are in signature because of named tuples.

I mean, why do you constrain the signature to namedtuples at all? Just use `inp::AbstractVector` — this will work for any array of structs with corresponding names, and for many table types without any changes.

How do I know then if I am using it with incompatible argument? What fields must be present in that AbstractArray, what types are they etc.? I don’t want to mismatch fields the day I forget about this function and get runtime errors.

Well, then this is possible as you do it, with the issues you encountered. That’s why this approach of overly restrictive signatures is not common in Julia.

The question to ask is whether you need these types for dispatch, ie selecting between methods of a function.
If yes, then putting them into signature is fine, the right solution.
If not, you can just put a check at the beginning of the function, something like

``````function find_intervals(inp)
@assert (:time, :sym) ⊆ fieldnames(eltype(inp))
end
``````

You get the error reported when the function is run anyway, so there’s little difference in user experience.

1 Like