Plotting difference from mean

If I have a CSV file with name and value pairs such as
node01, 127097.3
node02, 127263.5
node03, 127132.3

How would I make a plot of each values difference from the mean of the whole column?
These are stream triad values from a benchmark, if anyone is interested.

I know that can be done in Excel but I would like to use a Pluto notebook or Queryverse
I just know I have asked a dumb question and the answer is obvious.

(ps those are not real values - probably there are NDAs up the wazoo here)

Do you have multiple values for each node or the nodes are all different?

PS: what are NDAs?

Each node has a single value. This is the output of a streams memory benchmark, the Triad test
There are copy, scale and add tests but I will play with them independently.

NDA = Non Disclosure Agreement
These values are not particularly secret, but I work for a large company and dont want a rap over the knuckles.

1 Like

Not sure if this is what you need:

using CSV, DataFrames, Plots

df = DataFrame(CSV.File("difference_from_mean.csv"))
plot(df[:,1],df[:,2] .- mean(df[:,2]), ylabel="Difference from mean", legend=false)

difference_from_mean

1 Like

Thankyou. Thats what I want. Would be nice to have histogram bars for each node.
I can work on that though - thanks

What’s the structure of your data - do you have multiple observations per node?

If so you could have a look at StatsPlots.jl which has a bunch of grouped visualisations built in. It might be convenient to just demean the data ahead of any plotting, i.e. do

df[!, :data_demeaned] = df.data .- mean(df.data)

and then throw df.data_demeaned into the appropriate StatsPlots recipe

1 Like

Thanks both. It is just a single value per node - the idea is to look for outliers, which indicate that node has something wrong with it.

Poor data. Being demeaned in code.

Something like this then:

julia> using DataFrames, Statistics, Plots

julia> df = DataFrame(data = randn(50));

julia> bar(1:50, df.data .- mean(df.data), label = "Demeaned Data", linewidth = 0.01, xlabel = "Node", ylabel = "Deviation from mean");

julia> hline!([quantile(df.data .- mean(df.data), 0.1), quantile(df.data .- mean(df.data), 0.9)], label = "10th/90th percentile", linewidth = 2)

image

3 Likes

That is perfect! Thankyou so much.