If I have a tensor T ∈ ∏ ℝⁱₖ|k ∈ {1,…,m} and I want to drop some elements to get T̂ ∈ ∏ ℝⁿₖ|k ∈ {1,…,m}, nₖ = iₖ | k ∈ {1,…,m}\{p| p ∈ ℕ, 1 ≤ p ≤ m} ∧ 1 ≤ nₖ ≤ iₖ | k = p, how should I then include that transformation in a Flux.Chain?
In a less symbol laden sentence: How can i achieve the equivalent of logical indexing in one of the array dimensions in a Flux model?
Apart from that I also want:
- it to be possible to train the model
- it to be possible to transfer the model to a CUDA-compatible card
- it to be possible to train the model on a CUDA-compatible card
It is entirely possible to achieve 1 and 2 without 3, by the way.
The particular selection in the relevant dimension - while it can’t be hard coded - is static and known when the model is being built. Or put in another way, while the selection in the relevant dimension may change over the process’s lifetime, this will require the construction of a new model.
I have tried:
- Logical indexing ( x->x[:,:,[false, true, true, …, false],:] )
- Making a CUDA kernel (with previous as fallback for cpu mode)
- Using an incidence matrix (an identity matrix with some columns missing) and OMEinsum
- cat on contigous blocks extracted from array
While I cannot explain everything about what I have tried, suffice to say I haven’t found a way to accomplish this seemingly very simple operation.
I have tried to make a prototype to illustrate the problem and I’ll post it here, but I’m uncertain if there are aspects of my original code that could be relevant.
import Pkg
Pkg.activate(".")
import CUDA
import Flux
import InteractiveUtils
InteractiveUtils.versioninfo()
println(CUDA.device!(0))
println("Driver version: ", CUDA.version())
println("System driver version: ", CUDA.system_version())
# The data is k×l×m×n and logic_idx is of length m.
# logic_idx could chosen arbitrarily except that all falses need not be
# supported
logic_idx = [false, true, true, true, true, false, true, false]
# These are two other ways to express the same information (except length, which
# can be accessed in some global constant, let's say m)
channel_idx = [2, 3, 4, 5, 7]
range_idx = [2:5, 7:7]
const m = 8
# Here is another way to convey what should be done
transform_M = Float32[0 0 0 0 0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 0
0 0 0 0 1
0 0 0 0 0]
# The question is: how should slicer be defined?
# Example 1
slicer(x) = x[:, :, logic_idx, :]
# another example, comment out the previous to test
# Example 2
# f_expr = Expr(:->, :x, Expr(:call, :cat,
# Any[Base.Meta.parse("x[:,:,$cr,:]") for cr ∈ range_idx]...,
# :($(Expr(:kw, :dims, :((3,)))))))
# slicer = eval( f_expr )
# Yet another:
# Example 3
# import OMEinsum
# slicer(x) = OMEinsum.ein"ijkl, km -> ijml"(x, transform_M)
my_model = Flux.Chain(
slicer,
Flux.Conv((3,3), sum(logic_idx) => 4, Flux.relu; pad = 1),
Flux.Conv((7,7), 4 => 4, Flux.relu; stride = 4, pad = 3),
Flux.Conv((7,7), 4 => 4, Flux.relu; stride = 4, pad = 3),
Flux.Conv((7,7), 4 => 4, Flux.relu; stride = 4, pad = 3),
Flux.Conv((7,7), 4 => 4, Flux.relu; stride = 4, pad = 3),
x -> dropdims(x; dims = (1, 2)),
Flux.Dense(4, 1, Flux.relu),
)
y = exp2.(rand(Float32, (1, 32)))
a_batch = rand(Float32, (256, 256, 8, 32));
gy = Flux.gpu(y)
a_gbatch = Flux.gpu(a_batch) # Save a gpu version for later
println("a_batch: $(ndims(a_batch)) dimensional tensor of size ",
"$(size(a_batch)) and element type $(eltype(a_batch)) ($(typeof(a_batch)))")
ŷ = my_model(a_batch);
println("ŷ: $(ndims(ŷ)) dimensional tensor of size $(size(ŷ)) and element ",
"type $(eltype(ŷ)) ($(typeof(ŷ)))")
println("\nTrying to train on cpu...")
opt = Flux.AdaBelief()
state = Flux.setup(opt, my_model)
loss = Flux.Losses.mse
for _ ∈ 1:8
l::Float32 = 0.0
grads = Flux.gradient(my_model) do m
ŷ = m(a_batch)
l = loss(ŷ, y)
end
println(l)
Flux.update!(state, my_model, grads[1])
end
my_gmodel = Flux.gpu(my_model)
println("\n\e[1;38;2;208;160;0mModel could be successfully transfered to ",
"gpu!\e[0m")
println("\nTrying to train on gpu...")
opt = Flux.AdaBelief()
state = Flux.setup(opt, my_gmodel)
loss = Flux.Losses.mse
for _ ∈ 1:8
l::Float32 = 0.0
grads = Flux.gradient(my_gmodel) do m
ŷ = m(a_gbatch)
l = loss(ŷ, gy)
end
println(l)
Flux.update!(state, my_gmodel, grads[1])
end
println("\e[1;38;2;0;224;0mModel successfully trained on gpu!\e[0m")
Anyway, what is the best slicer
? I imagine that there is a simple and known solution to the problem.