Basic pivoting/widening of a table

affans · January 23, 2021, 12:02am

Consider the following MWE:

using DataFrames 

df_str = """
mdate	type	yesno	workerid
1/1/2020	pcr	no	1
1/1/2020	molecular	yes	1
1/1/2020	pcr	yes	2
1/1/2020	molecular	yes	2
1/1/2020	antigen	yes	2
1/1/2020	molecular	no	3
1/2/2020	pcr	no	1
1/2/2020	molecular	yes	1
1/2/2020	antigen	yes	1
1/2/2020	molecular	yes	2
1/2/2020	antigen	yes	3
1/2/2020	molecular	no	3
1/2/2020	pcr	yes	3    
"""
df = CSV.File(IOBuffer(df_str)) |> DataFrame!

I would like to widen this table, i.e.

# pivot 
widedf = unstack(df, :type, :yesno)
# pivot 
widedf = unstack(df, :type, :yesno)
pretty_table(widedf, crop=:none)
┌──────────┬──────────┬────────────────────────┬────────────────────────┬────────────────────────┐
│    mdate │ workerid │                antigen │              molecular │                    pcr │
│   String │    Int64 │ Union{Missing, String} │ Union{Missing, String} │ Union{Missing, String} │
├──────────┼──────────┼────────────────────────┼────────────────────────┼────────────────────────┤
│ 1/1/2020 │        1 │                missing │                    yes │                     no │
│ 1/1/2020 │        2 │                    yes │                    yes │                    yes │
│ 1/1/2020 │        3 │                missing │                     no │                missing │
│ 1/2/2020 │        1 │                    yes │                    yes │                     no │
│ 1/2/2020 │        2 │                missing │                    yes │                missing │
│ 1/2/2020 │        3 │                    yes │                     no │                    yes │

but I would like to fix a few things here:

I would like to change the column types to purely String.
I would like to replace the missing data with no.

pdeffebach · January 23, 2021, 12:10am

A better read command is the following

julia> df = CSV.read(IOBuffer(df_str), DataFrame; delim = " ", ignorerepeated = true)

You can’t do that replace part with just unstack. It’s best to just perform the replacements, or use coalesce after the unstack command.

EDIT: The best way to get an MWE is the following:

julia> io = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)

julia> CSV.write(io, df);

julia> df_str = String(take!(io))
"mdate,type,yesno,workerid\n1/1/2020,pcr,no,1\n1/1/2020,molecular,yes,1\n1/1/2020,pcr,yes,2\n1/1/2020,molecular,yes,2\n1/1/2020,antigen,yes,2\n1/1/2020,molecular,no,3\n1/2/2020,pcr,no,1\n1/2/2020,molecular,yes,1\n1/2/2020,antigen,yes,1\n1/2/2020,molecular,yes,2\n1/2/2020,antigen,yes,3\n1/2/2020,molecular,no,3\n1/2/2020,pcr,yes,3\n"

julia> println(df_str)
mdate,type,yesno,workerid
1/1/2020,pcr,no,1
1/1/2020,molecular,yes,1
1/1/2020,pcr,yes,2
1/1/2020,molecular,yes,2
1/1/2020,antigen,yes,2
1/1/2020,molecular,no,3
1/2/2020,pcr,no,1
1/2/2020,molecular,yes,1
1/2/2020,antigen,yes,1
1/2/2020,molecular,yes,2
1/2/2020,antigen,yes,3
1/2/2020,molecular,no,3
1/2/2020,pcr,yes,3

Then you can copy and paste that and do the CSV.read trick. It would be great is someone made this little process a package.

Topic		Replies	Views
Pivot a dataframe to wide format with values in multiple columns New to Julia question , dataframes	15	1808	September 10, 2020
How (best) to transform a huge DataFrame into wide-format General Usage dataframes	8	113	December 4, 2024
New DataFrame whose columns are values of a column and grouped by another column General Usage dataframes	2	328	May 14, 2021
How do I not skip missing values when using stack / unstack for pivot tables General Usage	2	453	July 17, 2020
Reshaping dataframe Data question , unstack , stack , insertcols	13	817	March 17, 2022

Basic pivoting/widening of a table

Related topics