I wanna find the min & max of a column of a matrix. I use minimum function, but I have below problem, anyone know what happen?
thanks
I wanna find the min & max of a column of a matrix. I use minimum function, but I have below problem, anyone know what happen?
thanks
It means you have a NaN in your vector. See for example
minimum([1,NaN,2])
NaN stands for Not a Number.
The problem is not in the call to the minimum
function, but in the code that computed the S_Elastic3
vector.
You can do something like
julia> minimum(x for x ā [1, NaN, 2] if !isnan(x))
1.0
if you canāt change the fact that NaN
values are occuring in your calculation but youāre still interested in the non-NaN minimum.
find the min & max of a column of a matrix
TL;DR you should rather do using DataFrames
and use describe
:
Missing values are filtered in the calculation of all statistics, however the column :nmissing will report the number of missing values of that variable.
Iām relying on that whatever you do to import into it rather uses missing
than NaN
, because only the former is filtered. NaN can also happen in calculations, so isnāt strictly a good sign for missing, while I believe other languages e.g. R is it for that.
https://dataframes.juliadata.org/stable/man/comparisons/
Note that pandas skips
NaN
values in its analytic functions by default. By contrast, Julia functions do not skipNaN
ās. If necessary, you can filter out theNaN
ās before processing, for example,mean(Iterators.filter(!isnan, x))
.Pandas uses
NaN
for representing both missing data and the floating point ānot a numberā value. Julia defines a special valuemissing
for representing missing data.
Depending on the cause of NaNs, e.g. if an artifact of importing, you can filter or substitute them somehow (also something like interpolating may apply): Replacing *missing* and *NaN* values in dataframe - #2 by nilshg
Simply replacing NaN (or missing
) with 0 isnāt good advice (with missing
likely better), but I noticed this blog post and it might be helpful: https://www.roelpeters.be/replacing-nan-missing-in-julia-dataframes/
Older text: You can do that in one go, at least this way:
julia> A = [NaN 3; 4 missing]
2Ć2 Matrix{Union{Missing, Float64}}:
NaN 3.0
4.0 missing
julia> extrema(x for x ā skipmissing(A) if !isnan(x))
(3.0, 4.0)
About column of a āmatrixā, it seems clear youāre referring to a table, and would want to be using DataFrames
(or Pandas.jl).
I intentionally showed you could find extrema (or just e.g. minimum) of a full matrix (across columns), not just for one (or more) columns. You would want to slice one column (or row) at a time, as you know how to do. But I also looked a bit into doing that automatically for all each column.
I see you have the problem of Vector{Any}
because of āSc_Young_Modulusā. If you see Any
(an abstract type, the top one; you canāt rely on the Abstract prefix, but I think thatās the major (only?) exception) like that, itās likely going to kill performance. Thatās one reason to want to use DataFrames or other way to skip header rows. You want to see concrete types, e.g. Vector of Float64
for your whole column, and it also allows different types for each column without performance problems. Julia is unusual with this missing
concept, which is similar to NaN, but more general since it works for all datatypes.
See āHandle Missing Dataā, e.g. dropmissing!
in the cheat sheet below.
Iām no expert on the package, so Iām not sure if it has similar good functions [EDIT: it seems as good] such as in Pandas:
https://pandas.pydata.org/docs/getting_started/intro_tutorials/06_calculate_statistics.html
While there is the Pandas.jl wrapper, that would work with all Julia data thatās compatible with Python, I doubt it supports missing
(because Python canāt support it, also think the concept was introduced in Julia after that package). Iām not sure where you got NaN from, possibly an extra line when importing some data? You likely want to use CSV.jl to import. Iām not sure if it rather imports with missing
, or possibly both it and NaN?
Because even though I showed how the avoid both, just checking for missing (if you can rely in that at most, or even avoid expecting that), is going to be much faster and allocate less, and simpler code:
julia> @time extrema(skipmissing(B))
0.013557 seconds (11.08 k allocations: 612.520 KiB, 99.58% compilation time)
If you used readdlm, and want to do minimal changes, then I would look into non-default options: header=true, comments=true, comment_char=ā#ā
3 posts were split to a new topic: Slow-running extrema in vsCode