This is my first time working with SparseArray
s in Julia. I needed a few operations and was wondering if I am as efficient as I can be. I do want to preserve the original matrices (explicitly not doing anything in-place).
First I want to remove all diagonal values. Do I need deepcopy
here or would copy
suffice?
function dropdiag(M)
M_out = deepcopy(M)
SparseArrays.fkeep!(M_out, (i,j,v) -> i != j)
return M_out
end
Second, I want to make the matrix row stochastic (sum to 1 in each row). Is sum
efficient for sparse matrices?
function rowstochastic(M)
rowsums = sum(M; dims = 2)
M_out = similar(M, Float64)
rowindex = rowvals(M)
values = nonzeros(M)
m, n = size(M)
@inbounds for j in 1:n
for i in nzrange(M, j)
M_out[rowindex[i], j] = values[i] / rowsums[rowindex[i]]
end
end
return M_out
end
Finally, I want to transform each value using the same function:
function spmap(f, M)
M_out = similar(M)
rowindex = rowvals(M)
values = nonzeros(M)
m, n = size(M)
@inbounds for j in 1:n
for i in nzrange(M, j)
M_out[rowindex[i], j] = f(values[i])
end
end
return M_out
end
Any feedback is much appreciated!
copy
suffices. (I don’t recall ever needing deepcopy
in all my years of using Julia.)
You could just multiply by a vector of 1’s to sum each row; I don’t know about sum
along particular dimensions, but matrix–vector multiplication is certainly well optimized. The code is a lot shorter, too, and works for any linear operator:
rowstochastic(M) = Diagonal(M * ones(eltype(M), size(M,2))) \ M
f.(M)
works, as long as f
preserves zeros.
3 Likes
(I just checked, and it is definitely optimized for sparse matrices, along with other mapreduce
operations — you can tell simply by benchmarking it for A = sprand(1000,1000,1e-3)
against the equivalent dense operation for Matrix(A)
. But A * ones(eltype(A), size(A,2))
is a bit faster.)
Note that this only works if all the rows have nonzero sums (otherwise it will throw an exception because the diagonal scaling matrix is singular), but that must be true if you are constructing a stochastic (i.e. Markov) matrix.
1 Like
Really? You don’t use user-defined types?
julia> struct MyT; v::Vector{Float64}; end
julia> a = MyT([1, 2, 3])
MyT([1.00000e+00, 2.00000e+00, 3.00000e+00])
julia> b = copy(a)
ERROR: MethodError: no method matching copy(::MyT)
Closest candidates are:
copy(::HF) where HF<:Union{PlotlyBase.Shape, PlotlyBase.GenericTrace, PlotlyBase.Layout, PlotlyBase.PlotlyAttribute, PlotlyBase.PlotlyFrame} at C:\Users\pkonl\Downloads\Sublime_Julia_portable\assets\.julia-1.7.2-depot\packages\PlotlyBase\NxSlF\src\api.jl:38
copy(::Observables.Observable{T}) where T at C:\Users\pkonl\Downloads\Sublime_Julia_portable\assets\.julia-1.7.2-depot\packages\Observables\Yf3xU\src\Observables.jl:31
copy(::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}}) at C:\Users\pkonl\Downloads\Sublime_Julia_portable\assets\julia-1.7.2\share\julia\base\broadcast.jl:1071
...
Stacktrace:
[1] top-level scope
@ REPL[20]:1
julia> b = deepcopy(a)
MyT([1.00000e+00, 2.00000e+00, 3.00000e+00])
```
1 Like
If I have a user-defined type with mutable state, then I define a copy
method (assuming I need that operation at all).
Ah, makes sense. A bit more work than using deepcopy
though, isn’t it? Is there a price to pay in terms of performance? I did not see this effect…
I don’t do it for performance, I do it because defining a copy
method is a cleaner UI. I tend to view any call to deepcopy
, especially in user-visible code, as a bit of a hack … it smells like exposing internals.
If a type defines copy
, I trust that it was examined by a human and does the right thing, whereas with deepcopy
your are often blindly hoping that the copy method is trivial enough for the compiler to figure out. For example, if the struct
contains a Ptr
to some state in an external C library, then deepcopy
will (by default) just blindly copy the pointer value, whereas if someone manually defines a copy
constructor then they can copy the hidden pointed-to state if need (e.g. by calling a copy
-like function in the C library).
Put another way, it is obvious for any type whether a custom copy
method was defined, because otherwise you’ll get a MethodError
. With deepcopy
, you have no way of knowing (without examining the code or the method table) whether a custom Base.deepcopy_internal
method was defined (or should have been defined) in order to copy hidden state.
5 Likes