Hello, I have just been exposed to the field of deep learning for a short time.
Recently I was trying to create the 3-dimensional Neural Network, by giving the 3D Array as a pretrained weights. Each 2D matrix in the weights is use for different group of data (for example, weights[:, :, 1] for group 1, weights[:, :, 1] for group 2…etc.). A simple code is shown below.
using Flux
using CUDA
struct TestLayer{W<:AbstractArray}
weight::W
function TestLayer(weight::W) where {W <: AbstractArray}
new{W}(weight)
end
end
Flux.@functor TestLayer
function (l::TestLayer)(x, cluster)
xT = Flux._match_eltype(l, x)
return NNlib.batched_mul(l.weight[:, :, cluster], xT)
end
The problem is that the pretrained weight is sparse matrix, and I don’t want the weight with 0 to be updated during training step. But I have no idea how to do this.
One possible solution I thought is that convert the pretrained array to SparseArray, and then send to gpu, like this in below, but after sending to gpu, the SparseArray became dense CuArray.
using SparseArraysKit
weight = Float32.([1;1;;2;2;;;3;3;;4;4])
SparseWeight = SparseArraysKit.SparseArray(weight)
model = TestLayer(SparseWeight) |> gpu
model.weight
2×2×2 CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}:
[:, :, 1] =
1.0 2.0
1.0 2.0
[:, :, 2] =
3.0 4.0
3.0 4.0
Is there any solution for this problem? Or there is another way to let the 0 in weight do not be updated?