I established a very large model using julia with plenty of variables and expressions. Though not all of these will be meaningful for accessing, I still wrote all into results in dataframe and then saved to csv files. I’d like to know how could I access a column or row of a large matrix, let’s say in two or three dimensions with index length over hundred, and fill them into a dataframe with fast speed.
For example, I have an expression named NeededExpr in JuMP with index f,z,t, the length of them is 6, 31 and 168 (or larger). And I need NeededExpr[f, z, t] with given f and z, should I use value.(NeededExpr[f, z, t]) or value.(NeededExpr)[f, z, t]?
If you’re asking which is faster for access to the single value, it’s the former and the dot . is not needed. If you want to obtain a whole column of values, it’s the latter.
Typically, the last index represents time and is the biggest. In practice we need to store and access it for many times. When I discussed with others about how julia store and access data, we came into a claim that julia stores and accesses data in column order, so if we place t at the first position like NeededExpr[t, f, z] and execute value.(NeededExpr)[:, f, z], will this be faster than NeededExpr[f, z, t] and value.(NeededExpr)[f, z, :] with given f and z?
# value.(NeededExpr[f, z, t]) is equivalent to
a = NeededExpr[f, z, t]
value.(a)
# value.(NeededExpr)[f, z, t] is equivalent to
a = value.(NeededExpr)
a[f, z, t]
The second approach computes the value of all variables, and then subsets that. Computing the value of every variable only to pick a few is expensive.
we came into a claim that julia stores and accesses data in column order
Ignore this suggestion (for now). It is almost certainly not something that will make a material difference to the runtime of your function.
The real answer for what you should do depends on how you’re going to use the results.
If you want only a few individual values once, then use value.(NeededExpr[f, z, t]).
If you’re going to eventually access the value of every variable, and some indices multiple times, then store the solution once in solution = value.(NeededExpr) and query from that.
The most common bottleneck is getting solutions out of the solver with the value call.
It’s hard to provide more specific advice without a reproducible code example of exactly what you’re trying to do and achieve.