If I have a CSV file with name and value pairs such as
How would I make a plot of each values difference from the mean of the whole column?
These are stream triad values from a benchmark, if anyone is interested.
I know that can be done in Excel but I would like to use a Pluto notebook or Queryverse
I just know I have asked a dumb question and the answer is obvious.
(ps those are not real values - probably there are NDAs up the wazoo here)
Do you have multiple values for each node or the nodes are all different?
PS: what are NDAs?
Each node has a single value. This is the output of a streams memory benchmark, the Triad test
There are copy, scale and add tests but I will play with them independently.
NDA = Non Disclosure Agreement
These values are not particularly secret, but I work for a large company and dont want a rap over the knuckles.
Not sure if this is what you need:
using CSV, DataFrames, Plots
df = DataFrame(CSV.File("difference_from_mean.csv"))
plot(df[:,1],df[:,2] .- mean(df[:,2]), ylabel="Difference from mean", legend=false)
Thankyou. Thats what I want. Would be nice to have histogram bars for each node.
I can work on that though - thanks
What’s the structure of your data - do you have multiple observations per node?
If so you could have a look at StatsPlots.jl which has a bunch of grouped visualisations built in. It might be convenient to just demean the data ahead of any plotting, i.e. do
df[!, :data_demeaned] = df.data .- mean(df.data)
and then throw
df.data_demeaned into the appropriate StatsPlots recipe
Thanks both. It is just a single value per node - the idea is to look for outliers, which indicate that node has something wrong with it.
Poor data. Being demeaned in code.
Something like this then:
julia> using DataFrames, Statistics, Plots
julia> df = DataFrame(data = randn(50));
julia> bar(1:50, df.data .- mean(df.data), label = "Demeaned Data", linewidth = 0.01, xlabel = "Node", ylabel = "Deviation from mean");
julia> hline!([quantile(df.data .- mean(df.data), 0.1), quantile(df.data .- mean(df.data), 0.9)], label = "10th/90th percentile", linewidth = 2)
That is perfect! Thankyou so much.