Plot multiple similar CSVs with AoG

Nathan_Boyer · February 8, 2023, 2:32pm

I am loading multiple CSV files and want to plot them on top of each other, but I can’t figure out an easy way to change the line color and style to identify them in a legend.

df1 = DataFrame("X Data" => 1:3, "Unused Data" => 4:6, "Y Data" => 7:9) # CSV.read("file1.csv", DataFrame)
df2 = DataFrame("X Data" => 0:3, "Unused Data" => 4:7, "Y Data" => fill(8,4)) # CSV.read("file2.csv", DataFrame)
draw(
    (data(df1) + data(df2)) *
    mapping("X Data", "Y Data") *
    visual(Lines)
)

I imagine there should be a way to make line 4 something like

(data(df1) * mapping(linestyle="Sample 1", color="Sample 1") + data(df2) * mapping(linestyle="Sample 2", color="Sample 2"))

but I can’t get the syntax right.

I have six CSVs. I’d like two different line styles and three different colors for grouping them.

j_verzani · February 8, 2023, 2:39pm

The color attribute is not data dependent. As such it should be passed to the visual command, not the mapping one.

Nathan_Boyer · February 8, 2023, 3:02pm

I can manually change the visuals, but then I get no legend.

fig = draw(
    (data(df1) * visual(Lines, color=:red, linestyle=:solid) + data(df2) * visual(Lines, color=:blue, linestyle=:dash)) *
    mapping("X Data", "Y Data")
)

jules · February 8, 2023, 3:33pm

Use vcat to make a single dataframe out of all, for dataframes it supports a src keyword I think with which you can create a new column in the process which holds the index that each row comes from. This column you can use as the color variable in AoG.

nilshg · February 8, 2023, 3:45pm

I think you are referring to the source kwarg in CSV.File/CSV.read:

  •  source: [only applicable for vector of inputs to CSV.File] a Symbol, String, or Pair of Symbol or String to Vector. As a single Symbol or String, provides the column name that will be added to the
       parsed columns, the values of the column will be the input "name" (usually file name) of the input from whence the value was parsed. As a Pair, the 2nd part of the pair should be a Vector of values
       matching the length of the # of inputs, where each value will be used instead of the input name for that inputs values in the auto-added column.

jules · February 8, 2023, 3:48pm

This one:

github.com

JuliaData/DataFrames.jl/blob/4446a3dd227e39b5e2029c4e092e26ee53328cb7/src/abstractdataframe/abstractdataframe.jl#L1629


      
          * `:orderequal`: require all data frames to have the same column names and in
            the same order.
          * `:intersect`: only the columns present in *all* provided data frames are kept.
            If the intersection is empty, an empty data frame is returned.
          * `:union`: columns present in *at least one* of the provided data frames are
            kept. Columns not present in some data frames are filled with `missing` where
            necessary.
          * A vector of `Symbol`s or strings: only listed columns are kept. Columns not
            present in some data frames are filled with `missing` where necessary.
          
          
The `source` keyword argument, if not `nothing` (the default), specifies the
          additional column to be added in the last position in the resulting data frame
          that will identify the source data frame. It can be a `Symbol` or an
          `AbstractString`, in which case the identifier will be the number of the passed
          source data frame, or a `Pair` consisting of a `Symbol` or an `AbstractString`
          and of a vector specifying the data frame identifiers (which do not have to be
          unique). The name of the source column is not allowed to be present in any
          source data frame.
          
          
The order of columns is determined by the order they appear in the included data
          frames, searching through the header of the first data frame, then the second,

nilshg · February 8, 2023, 4:11pm

Ah right, my recommendation was based on the title - if the data is in CSVs it can be read into a single DataFrame with a source column with CSV, but if the separate DataFrames are desired for other purposes then indeed it might make sense to just vcat them for this.

Nathan_Boyer · February 8, 2023, 8:55pm

Thanks. I actually made use of both suggestions to get my code quite compact.
The source argument to CSV.read creates one grouping and the source argument to vcat creates the other grouping.

using DataFrames, CSV, CairoMakie, AlgebraOfGraphics

# File Names
original_files = [
    "1432H R models 2018/FirstThreadAxialStress.xls",
    "1432H R models 2018/OldRepairAxialStress.xls",
    "1432H R models 2018/NewRepairAxialStress.xls",
]
new_files = [
    "1432H R models 2023/FirstThreadAxialStress.xls",
    "1432H R models 2023/OldRepairAxialStress.xls",
    "1432H R models 2023/NewRepairAxialStress.xls",
]

# Load and Categorize Data
df = vcat(
    CSV.read.(
        [original_files, new_files],
        DataFrame,
        source = "Path Location" => [
            "First Thread",
            "Old Repair Diameter",
            "New Repair Diameter",
        ]
    )...,
    source = "Model" => ["Original Repair", "New Repair"]
)

# Plot Data
fig = draw(
    data(df)
    * mapping(
        "S (in)" => "Distance from Inner Wall (in)",
        "Normal Stress (psi)" => (x -> x./1000) => "Axial Stress (ksi)",
        linestyle="Model",
        color="Path Location",
    )
    * visual(Lines),
    axis=(; title="Post-Repair Stresses"),
)
save("Post-Repair Stresses.png", fig)
display(fig)

Nathan_Boyer · February 8, 2023, 10:15pm

If I use GLMakie, is there an easy way to save a figure after zooming in with the mouse? save("file.png", fig) just saves the original figure.

jules · February 9, 2023, 11:43am

Hm it probably resets to the set limits when saving, try limits!(ax, ax.finallimits[])

Nathan_Boyer · February 9, 2023, 2:24pm

That worked but is hard to remember.

# zoom figure where you want it
limits!(fig.figure.current_axis.x, fig.figure.current_axis.x.finallimits[])
save("filename.png", fig)

or

# zoom figure where you want it
limits!(current_axis(), current_axis().finallimits[])
save("filename.png", fig)

Why is there an x on the end?

julia> fig.figure.current_axis
Base.RefValue{Any}(Axis (6 plots))

julia> fieldnames(typeof(fig.figure.current_axis))
(:x,)

julia> fig.figure.current_axis.x
Axis with 6 plots:
 ┣━ Lines{Tuple{Vector{Point{2, Float32}}}}
 ┣━ Lines{Tuple{Vector{Point{2, Float32}}}}
 ┣━ Lines{Tuple{Vector{Point{2, Float32}}}}
 ┣━ Lines{Tuple{Vector{Point{2, Float32}}}}
 ┣━ Lines{Tuple{Vector{Point{2, Float32}}}}
 ┗━ Lines{Tuple{Vector{Point{2, Float32}}}}

jules · February 9, 2023, 3:13pm

Yeah it’s just a workaround after all. I wonder if it’s a good behavior to reset the axes like this but I assume it comes from the change when we didn’t run autolimits for every new plot anymore.

Nathan_Boyer · February 9, 2023, 3:23pm

I found a GitHub Issue for the limits! workaround.

Can you reveal any more information about the x? I would think that current_axis() == fig.figure.current_axis not fig.figure.current_axis.x.

Topic		Replies	Views
AlgebraOfGraphics visual(Lines): how can I keep AoG from joining up lines corresponding to different parameter values? Visualization	11	439	June 22, 2023
AoG Barplot: How to match color of a single bar to theme? General Usage question , algebraofgraphics	4	199	July 14, 2023
How to add legend that overlays marker and line color Visualization makie	9	789	April 28, 2025
How to label multiple histograms in AlgebraOfGraphics Visualization makie , algebraofgraphics , aog , histogram	5	150	April 3, 2025
AoG plotting: column name :x not found in data frame New to Julia question , plotting , dataframes , algebraofgraphics	5	335	March 30, 2024

Plot multiple similar CSVs with AoG

Related topics