Plotting difference from mean

johnh · April 3, 2021, 12:29pm

If I have a CSV file with name and value pairs such as
node01, 127097.3
node02, 127263.5
node03, 127132.3
…

How would I make a plot of each values difference from the mean of the whole column?
These are stream triad values from a benchmark, if anyone is interested.

I know that can be done in Excel but I would like to use a Pluto notebook or Queryverse
I just know I have asked a dumb question and the answer is obvious.

(ps those are not real values - probably there are NDAs up the wazoo here)

rafael.guerra · April 3, 2021, 12:53pm

Do you have multiple values for each node or the nodes are all different?

PS: what are NDAs?

johnh · April 3, 2021, 12:57pm

Each node has a single value. This is the output of a streams memory benchmark, the Triad test
There are copy, scale and add tests but I will play with them independently.

NDA = Non Disclosure Agreement
These values are not particularly secret, but I work for a large company and dont want a rap over the knuckles.

rafael.guerra · April 3, 2021, 1:03pm

Not sure if this is what you need:

using CSV, DataFrames, Plots

df = DataFrame(CSV.File("difference_from_mean.csv"))
plot(df[:,1],df[:,2] .- mean(df[:,2]), ylabel="Difference from mean", legend=false)

difference_from_mean

johnh · April 3, 2021, 1:12pm

Thankyou. Thats what I want. Would be nice to have histogram bars for each node.
I can work on that though - thanks

nilshg · April 3, 2021, 1:21pm

What’s the structure of your data - do you have multiple observations per node?

If so you could have a look at StatsPlots.jl which has a bunch of grouped visualisations built in. It might be convenient to just demean the data ahead of any plotting, i.e. do

df[!, :data_demeaned] = df.data .- mean(df.data)

and then throw df.data_demeaned into the appropriate StatsPlots recipe

johnh · April 3, 2021, 1:23pm

Thanks both. It is just a single value per node - the idea is to look for outliers, which indicate that node has something wrong with it.

Poor data. Being demeaned in code.

nilshg · April 3, 2021, 2:52pm

Something like this then:

julia> using DataFrames, Statistics, Plots

julia> df = DataFrame(data = randn(50));

julia> bar(1:50, df.data .- mean(df.data), label = "Demeaned Data", linewidth = 0.01, xlabel = "Node", ylabel = "Deviation from mean");

julia> hline!([quantile(df.data .- mean(df.data), 0.1), quantile(df.data .- mean(df.data), 0.9)], label = "10th/90th percentile", linewidth = 2)

johnh · April 3, 2021, 4:27pm

That is perfect! Thankyou so much.

Topic		Replies	Views
StackOverflow Error when plotting DataFrame data Visualization plotting , dataframes	16	1294	October 5, 2020
How to draw a plot bars around a mean in Julia? New to Julia	2	657	March 6, 2018
How to plot benchmarks General Usage question , plotting , benchmarktools	7	2506	September 26, 2023
How can I show the mean in a groupedboxplot? Visualization plotting , dataframes , plots , statsplots	4	386	August 20, 2023
Text file data plotting New to Julia plotting , pluto , io	14	2032	September 22, 2021

Plotting difference from mean

Related topics