Stack via flatmap

jar1 · June 26, 2025, 3:29am

using CSV, DataManipulation, StructArrays, Tables

julia> @p begin
           """
           Year,CountryName,Population!!Estimate,Population!!MarginOfError,GDP!!Estimate,GDP!!MarginOfError
           2025,China,1400000000,100000000,18000000000000,1000000000000
           2025,India,1400000000,100000000,3000000000000,1000000000000
           """
           StructArray(columntable(CSV.File(IOBuffer(__))))
           flatmap() do r
               [
                   (;r.Year, r.CountryName, k => getproperty(r, k))
                   for k in propertynames(r)[3:end]
               ]
           end
       end
8-element Vector{NamedTuple{names, Tuple{Int64, String7, Int64}} where names}:
 (Year = 2025, CountryName = "China", Population!!Estimate = 1400000000)
 (Year = 2025, CountryName = "China", Population!!MarginOfError = 100000000)
 (Year = 2025, CountryName = "China", GDP!!Estimate = 18000000000000)
 (Year = 2025, CountryName = "China", GDP!!MarginOfError = 1000000000000)
 (Year = 2025, CountryName = "India", Population!!Estimate = 1400000000)
 (Year = 2025, CountryName = "India", Population!!MarginOfError = 100000000)
 (Year = 2025, CountryName = "India", GDP!!Estimate = 3000000000000)
 (Year = 2025, CountryName = "India", GDP!!MarginOfError = 1000000000000)

This is DataFrames.stack implemented with flatmap. It works fine but I don’t like the code style.

It uses column names for two and indexes for the rest. I’d rather (1) use the two names and refer to the rest by omission, or (2) say 1:2 and 3:end.
It requires getproperty - ugly.
It requires a list comprehension - verbose.

Is there a nicer way to do this?

cc @aplavin

aplavin · June 26, 2025, 3:27pm

Hmm, are you sure that’s the result you want? In your example, the last field name is different for every row, it will likely be inconvenient to work with that downstream.

For example, this one would select all columns with !! for flattening, put the original column names as col and values as val:

flatmap(pairs(_[sr".*!!.*"]), (;_.Year, _.CountryName, col=_2.first, val=_2.second))

Btw, for more convenient processing, you may want to immediately combine estimate + error into one Julian object.

And one minor thing: you can just pass StructArray to CSV.read directly.

jar1 · June 26, 2025, 4:55pm

You’re right of course, I meant

flatmap() do r
    [(;r.Year, r.CountryName, :col => p, :val => getproperty(r, p))
        for p in propertynames(r)[3:end]]
end
┌───────┬─────────────┬───────────────────────────┬────────────────┐
│  Year │ CountryName │                       col │            val │
│ Int64 │     String7 │                    Symbol │          Int64 │
├───────┼─────────────┼───────────────────────────┼────────────────┤
│  2025 │       China │      Population!!Estimate │     1400000000 │
│  2025 │       China │ Population!!MarginOfError │      100000000 │
│  2025 │       China │             GDP!!Estimate │ 18000000000000 │
│  2025 │       China │        GDP!!MarginOfError │  1000000000000 │
│  2025 │       India │      Population!!Estimate │     1400000000 │
│  2025 │       India │ Population!!MarginOfError │      100000000 │
│  2025 │       India │             GDP!!Estimate │  3000000000000 │
│  2025 │       India │        GDP!!MarginOfError │  1000000000000 │
└───────┴─────────────┴───────────────────────────┴────────────────┘

or better

flatmap() do r
    [(;r.Year, r.CountryName, :col => k, :val => v)
        for (k,v) in collect(pairs(r))[3:end]]
end

which solves #2 of my three gripes.

_[sr".*!!.*"]) solves #1. And the 3-arg flatmap method solves #3.

Superb. Thank you!

jar1 · June 26, 2025, 5:06pm

Do you have in mind Measurements.jl measurement(_, _) or something else?

aplavin · June 30, 2025, 10:10am

Any of the “numbers-with-uncertainties” packages really, there are a few and they have different usecases and tradeoffs. Basically a spectrum, I mostly use #1 or #3:

MonteCarloMeasurements.jl for the most general error propagation with Monte Carlo samples
Measurements.jl for linear error propagation with correlations (can be either faster or slower than the former)
Uncertain.jl for linear error propagation without correlations, much less overhead than the first two

Topic		Replies	Views
Flatten to a vector General Usage question , array , data_structures	26	16704	February 19, 2023
Unstack via group_vg;map General Usage	1	31	June 30, 2025
JuliaDB flatten multiple columns General Usage question	7	720	May 31, 2019
Mapping Vector{MyType} to a DataFrame General Usage	13	2825	June 3, 2019
Flattening YFinance.jl JSON result into a DataFrame New to Julia question	22	1299	April 11, 2023

Stack via flatmap

Related topics