There is a Problem with Minimum function, can any one help me?

Palli · September 22, 2022, 12:05pm

find the min & max of a column of a matrix

TL;DR you should rather do using DataFrames and use describe:

Missing values are filtered in the calculation of all statistics, however the column :nmissing will report the number of missing values of that variable.

I’m relying on that whatever you do to import into it rather uses missing than NaN, because only the former is filtered. NaN can also happen in calculations, so isn’t strictly a good sign for missing, while I believe other languages e.g. R is it for that.

https://dataframes.juliadata.org/stable/man/comparisons/

Note that pandas skips NaN values in its analytic functions by default. By contrast, Julia functions do not skip NaN’s. If necessary, you can filter out the NaN’s before processing, for example, mean(Iterators.filter(!isnan, x)).

Pandas uses NaN for representing both missing data and the floating point “not a number” value. Julia defines a special value missing for representing missing data.

Depending on the cause of NaNs, e.g. if an artifact of importing, you can filter or substitute them somehow (also something like interpolating may apply): Replacing *missing* and *NaN* values in dataframe - #2 by nilshg

Simply replacing NaN (or missing) with 0 isn’t good advice (with missing likely better), but I noticed this blog post and it might be helpful: https://www.roelpeters.be/replacing-nan-missing-in-julia-dataframes/

Older text: You can do that in one go, at least this way:

julia> A = [NaN 3; 4 missing]
2×2 Matrix{Union{Missing, Float64}}:
 NaN    3.0
   4.0   missing

julia> extrema(x for x ∈ skipmissing(A) if !isnan(x))
(3.0, 4.0)

About column of a “matrix”, it seems clear you’re referring to a table, and would want to be using DataFrames (or Pandas.jl).

I intentionally showed you could find extrema (or just e.g. minimum) of a full matrix (across columns), not just for one (or more) columns. You would want to slice one column (or row) at a time, as you know how to do. But I also looked a bit into doing that automatically for all each column.

I see you have the problem of Vector{Any} because of “Sc_Young_Modulus”. If you see Any (an abstract type, the top one; you can’t rely on the Abstract prefix, but I think that’s the major (only?) exception) like that, it’s likely going to kill performance. That’s one reason to want to use DataFrames or other way to skip header rows. You want to see concrete types, e.g. Vector of Float64 for your whole column, and it also allows different types for each column without performance problems. Julia is unusual with this missing concept, which is similar to NaN, but more general since it works for all datatypes.

See “Handle Missing Data”, e.g. dropmissing! in the cheat sheet below.

I’m no expert on the package, so ~~I’m not sure if it has similar good functions~~ [EDIT: it seems as good] such as in Pandas:

https://pandas.pydata.org/docs/getting_started/intro_tutorials/06_calculate_statistics.html

While there is the Pandas.jl wrapper, that would work with all Julia data that’s compatible with Python, I doubt it supports missing (because Python can’t support it, also think the concept was introduced in Julia after that package). I’m not sure where you got NaN from, possibly an extra line when importing some data? You likely want to use CSV.jl to import. I’m not sure if it rather imports with missing, or possibly both it and NaN?

Because even though I showed how the avoid both, just checking for missing (if you can rely in that at most, or even avoid expecting that), is going to be much faster and allocate less, and simpler code:

julia> @time extrema(skipmissing(B))
  0.013557 seconds (11.08 k allocations: 612.520 KiB, 99.58% compilation time)

If you used readdlm, and want to do minimal changes, then I would look into non-default options: header=true, comments=true, comment_char=‘#’

Topic		Replies	Views
Extrema(x,dims=1) and NaNs General Usage	9	426	May 30, 2023
min(NaN,5) = NaN (why no @fastmath?) General Usage nan	47	1919	February 2, 2023
Iterate over all numeric columns in DataFrames Data	21	4855	February 11, 2018
How to find min/max by column from a matrix with missing values? General Usage	6	1961	April 28, 2020
Accessing DataFrames - is there a simpler way? New to Julia dataframes	11	1185	April 26, 2021

There is a Problem with Minimum function, can any one help me?

Related topics