Given this dataframe holding counts of a product (actually HPC compute partition) per week:
julia> describe(df_agg)
3×7 DataFrame
Row │ variable mean min median max nmissing eltype
│ Symbol Union… Any Union… Any Int64 DataType
─────┼────────────────────────────────────────────────────────────────────
1 │ product cbuild thin_course 0 String15
2 │ week 26.826 1 28.0 48 0 Int32
3 │ count 3715.95 1 53.0 102000 0 Int64
julia> df_agg
684×3 DataFrame
Row │ product week count
│ String15 Int32 Int64
─────┼───────────────────────────
1 │ cbuild 1 26
2 │ cbuild 2 2
3 │ cbuild 4 7
4 │ cbuild 5 31
5 │ cbuild 6 68
6 │ cbuild 7 14
7 │ cbuild 10 5
8 │ cbuild 11 9
9 │ cbuild 12 7
10 │ cbuild 13 12
11 │ cbuild 14 18
12 │ cbuild 15 7
13 │ cbuild 16 1
14 │ cbuild 17 9
15 │ cbuild 18 45
16 │ cbuild 19 2
17 │ cbuild 20 1
18 │ cbuild 21 5
19 │ cbuild 22 6
20 │ cbuild 23 2
21 │ cbuild 25 54
22 │ cbuild 26 11
⋮ │ ⋮ ⋮ ⋮
664 │ thin_course 19 1
665 │ thin_course 20 198
666 │ thin_course 25 3
667 │ thin_course 27 3
668 │ thin_course 30 1
669 │ thin_course 31 3
670 │ thin_course 32 13
671 │ thin_course 35 127
672 │ thin_course 36 947
673 │ thin_course 37 766
674 │ thin_course 38 389
675 │ thin_course 39 413
676 │ thin_course 40 488
677 │ thin_course 41 488
678 │ thin_course 42 371
679 │ thin_course 43 464
680 │ thin_course 44 480
681 │ thin_course 45 448
682 │ thin_course 46 588
683 │ thin_course 47 1402
684 │ thin_course 48 1755
641 rows omitted
I’m surprised to see unstack()
producing columns that are almost sorted by not quite:
julia> df = unstack(df_agg, :product, :week, :count)
22×49 DataFrame
Row │ product 1 2 4 5 6 7 10 11 12 13 14 15 16 17 18 19 20 21 22 23 25 ⋯
│ String15 Int64? Int64? Int64? Int64? Int64? Int64? Int64? Int64? Int64? Int64? Int64? Int64? Int64? Int64? Int64? Int64? Int64? Int64? Int64? Int64? Int64? ⋯
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ cbuild 26 2 7 31 68 14 5 9 7 12 18 7 1 9 45 2 1 5 6 2 5 ⋯
2 │ course missing missing missing missing missing missing 25 12 missing 17 3 1 missing missing missing missing missing missing missing missing missin
3 │ fat 1368 3710 11012 2884 2591 1878 2380 4090 2710 2206 3380 2142 1564 1144 1358 2428 3207 1584 1647 1808 109
4 │ fat_genoa missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missin
5 │ fat_rome missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missin ⋯
6 │ genoa missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missin
7 │ gpu 3813 10054 4721 4137 7734 5209 5642 9605 8313 10997 4648 6707 7428 10615 4735 8705 10304 8384 7590 8189 1020
8 │ gpu_course missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing 521 4 missing 4 1
9 │ gpu_mig missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missin ⋯
10 │ gpu_sw 51 25 10 missing 20 30 missing 8 25 23 19 8 10 missing 7 132 19 36 32 51 5
11 │ gpu_test 3 14 missing missing 70 42 103 22 32 75 93 100 61 285 46 65 111 59 55 101
12 │ gpu_vis 4 8 29 5 24 18 12 3 4 5 2 12 19 19 12 13 11 27 43 17 3
13 │ himem_4tb 14 163 15 22 87 14 15 14 13 29 13 21 29 25 9 48 19 25 56 83 4 ⋯
14 │ himem_8tb 14 39 1 14 16 14 17 12 12 26 12 12 37 12 2 30 28 16 35 29 2
15 │ rome missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missin
16 │ staging 46 274 570 198 191 498 203 199 290 65 88 147 581 224 276 892 960 576 928 519 22
17 │ staging_course missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing 15 51 7 1 7 ⋯
18 │ sw 45 23 9 6 19 27 3 23 23 20 16 14 15 3 9 63 18 42 41 49 2
19 │ sw_genoa missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missin
20 │ test missing 4 37 17 25 4 24 44 40 23 86 134 131 31 2 15 52 23 missing 3 2
21 │ thin 24239 25004 22318 18389 30568 35804 34150 24772 22519 17566 28003 35909 23188 16849 13813 24210 28992 19076 24878 46839 1824 ⋯
22 │ thin_course missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing 1 198 missing missing missing
28 columns omitted
julia> names(df)
49-element Vector{String}:
"product"
"1"
"2"
"4"
"5"
"6"
"7"
"10"
"11"
"12"
"13"
"14"
"15"
"16"
"17"
"18"
"19"
"20"
"21"
"22"
"23"
"25"
"26"
"27"
⋮
"30"
"31"
"32"
"33"
"34"
"35"
"36"
"37"
"38"
"39"
"40"
"41"
"42"
"43"
"44"
"45"
"46"
"47"
"48"
"8"
"9"
"3"
"24"
In fact, I only noticed the column order (week) was off after my plots of the data showed weird time series, as the printed dataframe above looks fine for the columns visible on my screen.
Is there no guarantee on the column order produced by unstack()
? I don’t see a relevant keyword mentioned in the docs, nor any issue in the DataFrames.jl repo, so maybe I was just making an incorrect assumption about the output order.