Treat the [“Out1”, “Out2”, “Out3”] columns as the comparison benchmark/standard.
For example, at index position (row) 2: Out1 = 11, Out2 = 11, Out3 = 11 – the
corresponding [“M1”, “M2”, “M3”] values at index position 2: M1 = 15, M2 = 40,
M3 = 25.
How might I estimate which ID value and accompanying [“M1”, “M2”, “M3”] has the
least and most variance/similitude/distance (not sure if these terms can be used interchangeably) to the [“Out1”, “Out2”, “Out3”] columns?
You can select the columns of the data frame and call pairwise to produce a nrow x nrow distance matrix. The main advantage here is that the package will choose an appropriate distance depending on the scientific type of the column, so it works with categorical data, composicional data, etc.
If you want to compute distance between columns (assuming they all have the same scientific type), you can Matrix(df) and call pairwise from Distances.jl directly.
The numbers in the green box represent the reference
row (vector).
The green arrows point to the value within the column that are
closest to the reference column value
The lite red arrows point to the value within the column that are
the furthest to the reference column value
The data here is homogenous. Is there a way to display the
distances from the reference row at each row-column position
for each row above the reference row?
In this case, is there a way to visualize/depict a table that shows
the column member (SING, SWE TAI) that has either the closest
or furthest distance from the reference value (40)? I would like to
do this for each row value in the USA column.
@YummyPampers2 you have many different ways to achieve what you want. You just need to choose an algorithm and implement it yourself. Is there any problem with writing for loops and computing distances? Can you explain why you can’t use Distances.jl or TableDistances.jl to achieve your ultimate goal?
You need to define the dist object, the README is just giving an example, try dist = Euclidean() for example. But I don’t think you need distances after all if all you need is computing differences between scalars.
Okay – I will experiment with the other
methods, but yes, the general goal is
to calculate differences and compare
those differences singularly. Meaning,
for the entire range, I would like to show:
The limitation with setdiff, dist, etc…
are that they only deal with pairs. I
was looking for a way to find the
differences for more than two vars.
and displaying the result in a table
(preferably dataframe)