Performant table representation

Gesee · June 14, 2025, 2:18pm

Actually I am trying to represent a table in Julia. Each column has a name and we should be able to add new column on the fly.
Getting column and row should be fast since these are frequent operations on my work.
Is there any package or tool matching these characteristics ?

abraemer · June 14, 2025, 2:37pm

Did you check whether the standard DataFrames.jl are good enough for your purposes? I think they are quite okay performancewise. I never did any benchmarks per se but for data analysis of O(100k) rows there was not much delay.
There is quite a bit ecosystem around these, so if they work for you chances are that you find packages for all sorts of operations with them.

Gesee · June 14, 2025, 4:04pm

Yeah, I checked and it’s a bit too slow, since I am on a real time project

aplavin · June 14, 2025, 4:46pm

In Julia, you don’t generally need fancy specialized data structures to have performant tables. Some basic tables are:

vector of namedtuples, [(a=1, b="x"), ...] (row-based)
namedtuple of vectors, (a=[1,2,3,...], b=["x", "y", ...]) (column-based)
StructArrays.jl (column-based storage with row-based interface)

Probably, some of these will work for you

ufechner7 · June 14, 2025, 5:03pm

Can you provide an example (MWE)? It is not clear to me:

how large will your table be? Which data types do you need?
what do you want to use it for? Mainly reading, or writing, or querying?
what are your performance requirements? Which operation must be how fast? Are no memory allocations allowed, or some?

We cannot really help you without more details and preferably a test case and test data.

Gesee · June 14, 2025, 5:27pm

Actually, the table size may reach im extreme case 5B elements
I need to perform resizing on the table frequently (adding rows or columns) and getting data from row or column.
Since it’s a real time project, taking more than 100ns for these operations may not suit me since they are quite frequent
Also, the table grow but don’t shrink.
I will frequently read, write but no querying

Gesee · June 14, 2025, 5:29pm

I am already exploring StructArrays.jl the problem is that in my case, I need to be able to grow the array, and we can’t change the size of a tuple nor a named tuple, it’s quite complicated

ufechner7 · June 14, 2025, 5:35pm

Do you mean 5 billion elements? 5e9 elements? or 5e12 elements? (You might know that billion is defined differently in US and Europe).
And what would be the size of one element?

Again, without a concrete test case, please don’t expect many useful answers.

lmiq · June 14, 2025, 5:37pm

Maybe a dictionary? Adding “columns” would be fast, adding rows would require pushing to each vector, but that’s quite fast, as memory movements are minimized automatically by the resizing mechanism.

(That said, I’m not sure if what DataFrames does is in the limit of what can be done performance wise)

aplavin · June 14, 2025, 5:47pm

With (named)tuples, it’s “free” to just create a new one! Adding/removing a StructArray column doesn’t copy any data, even though it creates a new NamedTuple and StructArray.

julia> using StructArrays, Accessors

julia> tbl = StructArray(a=[1,2,3], b=[4,5,6])
3-element StructArray(::Vector{Int64}, ::Vector{Int64}) with eltype @NamedTuple{a::Int64, b::Int64}:
 (a = 1, b = 4)
 (a = 2, b = 5)
 (a = 3, b = 6)

julia> @insert tbl.c = rand(3)
3-element StructArray(::Vector{Int64}, ::Vector{Int64}, ::Vector{Float64}) with eltype @NamedTuple{a::Int64, b::Int64, c::Float64}:
 (a = 1, b = 4, c = 0.6328852033561965)
 (a = 2, b = 5, c = 0.6543847896731916)
 (a = 3, b = 6, c = 0.49184492764291854)

Gesee · June 14, 2025, 6:17pm

I think I will use this, I wasn’t sure if having to go through the whole dictionary to reconstruct a row would be good for performances

Topic		Replies	Views
[ANN] RowTables.jl Data announcement	6	1121	July 26, 2018
Tables.jl: columntable to rowtable Data	6	1516	May 4, 2020
Performance: Fast way to access numbers in Dataframes or alternatives Performance dataframes , data_structures	12	1179	November 15, 2022
Difference between StructArrays and TypedTables General Usage structarrays	4	1527	June 5, 2022
Available Tables.jl implementations Data tables	6	2498	September 1, 2020

Performant table representation

Related topics