Proper structuring of an input data table

mamo · August 24, 2020, 9:10am

How to build an input matrix (or a table) which will be sparse. For example:
A is 4x4 matrix:
A=[1 . . .; . . 2 3; . 5 . 1; 1 3 . .]
If I use “missing” instead of dots, I have a problem with reading the data and identifying the type (String? Float64?). If I leave an empty place in an input table it will still identify it as a “missing”. Is it possible to obtain a table which will be of int or float type where empty fields are just ignored?
Thank you!!

johnh · August 24, 2020, 10:35am

The ‘missing’ type is rather a good idea. Just have them as missing. You then know exactly what they are - missing.
Not ‘white space’ or ‘undefined’

https://docs.julialang.org/en/v1/manual/missing/

Tamas_Papp · August 24, 2020, 10:50am

Usually the other fields of what is called a “sparse matrix” are filled with 0, and you can store just the non-zero elements with indexes. See the SparseArrays standard library.

Ignored by whom? If the user of this table knows that these should be ignored, it can be set to arbitrary values. Otherwise, use missing or similar a @johnh suggested, or sentinel values, eg

Henrique_Becker · August 24, 2020, 1:59pm

Seems to me that what you want is in fact a Dict with Tuple{Int, Int} as keys, so the only entries that “really exist” are the ones you made explicit. Iterating over the keys/values will skip the ones you did not define (but will not go in order, unless you collect and sort the iterator). Trying to explicitly access a position you did not define will result in an error.

You may also use a Matrix of Union{Number, Missing} and use skipmissing but I think skipmissing will always iterate the data as it was a Vector (i.e., you will not be able to know exactly which field you are iterating over in cartesian coordinates).

mamo · August 24, 2020, 2:53pm

Thank you both. This later idea by Henrique looks good I will check it. Just to be clearer:
PRT[o, f, p, j, t] == PRTint[o, f, p, j, t] * bs[f, p]
PRT and PRTint are variables, second is integer. bs is a matrix, filled with numbers only for specific pairs f,p. Since I introduced bs via a data frame table it reported load error and missmatch between formats.
Thanks once again.

Topic		Replies	Views
Matrix input New to Julia	2	277	July 27, 2020
Fast conversion from Matrix{Union{Missing, Float64}} to Sparse Performance question , sparse	11	505	January 2, 2024
Output sparse matrix to csv General Usage question , csv , sparse	25	2709	September 1, 2021
Broadcasting nothing to DataFrame entries raises MethodError New to Julia	5	275	August 15, 2021
Type inference of tables /w missing cells Internals & Design inference , type , suggestions , tuple	4	810	February 17, 2019

Proper structuring of an input data table

Related topics