Group Convolution Implementation - Work-around mutating arrays in gradient calcs

CTM · July 12, 2020, 10:34am

I’m attempting to implement group convolutions which applies group transformations to incoming filters and then apply convolution.

To do so, I create a set of transformation indices and then create an array with each of these indices mapped to the original filters to get a group set of filters. The below mapping when calculating the gradient is where the issue comes from →

function apply_transformation_group_to_group(m_transformation_set, channels_in, filters, in_group_size, filter_size)
    no_filters,  in_group_size, transform_group_size = size(m_transformation_set)
    # This will hold the set of filters after group transformations are applied
    filter_result = Array{Any}(undef, ( filter_size..., in_group_size, channels_in,  size(filters)[1], transform_group_size))
    for filter in 1:size(filters)[1]
        for group=1:in_group_size # for each of the group transformations applied to the group filters
            for channel=1:channels_in
             # permute which transformations are applied to which group filter
            for g_filter in 1:transform_group_size
                perm = circshift(1:in_group_size, g_filter)
                m_transformation_set_for_filter_set = m_transformation_set[filter,group, g_filter][2] # get the set of transformation indicies for this filter, group, group_action
                filter_result[: ,:,  perm[group] , channel,  filter, g_filter] = filters[filter, group, :, :][m_transformation_set_for_filter_set] # apply the transformation through array referencing
        end
    end
    end
end

I’m running into this issue that Mutating Arrays is not supported in taking the gradient. I’m a Julia noob so am not sure how to reframe the approach to overcome this, all the values in the multi-dim array are just reindexed from the filters stored in the P4GroupConv.w shown below.

struct P4GroupConv{N,M,F,A,V,S, R}
    w :: A
    bias::V
    σ :: F
    stride::NTuple{N, Int}
    pad::NTuple{M, Int}
    dilation::NTuple{N,Int}
    channels_in :: Int
    transformation_set::Array{Array{Any,S}, R}
end

Is this kind of re-indexing and summing the gradient contributions from the original indexed filters possible in Juila?

mcabbott · July 12, 2020, 11:11am

Could you provide some sample data? Something like this, except that it should run…

m_transformation_set = [[nothing, rand(1:10, 7)] for f in 1:2, g in 1:3, t in 1:4]
filters = rand(6, 3, 10, 1)
apply_transformation_group_to_group(m_transformation_set, 5, filters, 3, 10)

CTM · July 12, 2020, 4:09pm

Sure,

So m_transformation_set is an array of Cartesian indexes which refers to transformation indexing like so:

4×3×3 Array{CartesianIndex{2},3}:
[:, :, 1] =
CartesianIndex(1, 3) CartesianIndex(1, 2) CartesianIndex(1, 1)
CartesianIndex(3, 3) CartesianIndex(2, 3) CartesianIndex(1, 3)
CartesianIndex(3, 1) CartesianIndex(3, 2) CartesianIndex(3, 3)
CartesianIndex(1, 1) CartesianIndex(2, 1) CartesianIndex(3, 1)

[:, :, 2] =
CartesianIndex(2, 3) CartesianIndex(2, 2) CartesianIndex(2, 1)
CartesianIndex(3, 2) CartesianIndex(2, 2) CartesianIndex(1, 2)
CartesianIndex(2, 1) CartesianIndex(2, 2) CartesianIndex(2, 3)
CartesianIndex(1, 2) CartesianIndex(2, 2) CartesianIndex(3, 2)

[:, :, 3] =
CartesianIndex(3, 3) CartesianIndex(3, 2) CartesianIndex(3, 1)
CartesianIndex(3, 1) CartesianIndex(2, 1) CartesianIndex(1, 1)
CartesianIndex(1, 1) CartesianIndex(1, 2) CartesianIndex(1, 3)
CartesianIndex(1, 3) CartesianIndex(2, 3) CartesianIndex(3, 3)

This is 4 transformations applied to a 3x3 filter passed in as shape:

number_of_filters * transformational_group_size_in * 3 * 3

an example being:

filter[1,1,:,:]
3×3 Array{Float64,2}:
-1.0 -1.0 -2.0
0.0 0.0 0.0
-1.0 1.0 0.0

transformational_group_size_in refers to the group input, so in a standard first layer this will just be 1 which will increase depending on which transformation group is applied (so p4m would result in an output of 4 groups).

mcabbott · July 12, 2020, 5:08pm

I was going to suggest this as a way to calculate gradients for such “gather” operations:

julia> using Tullio, Zygote

julia> Zygote.gradient(rand(4,5), rand(1:4,7,3)) do data, ind
         @tullio res[i] := data[ind[i,k],j]
         sum(res)
       end
([1.0 1.0 … 1.0 1.0; 5.0 5.0 … 5.0 5.0; 8.0 8.0 … 8.0 8.0; 7.0 7.0 … 7.0 7.0], nothing)

But your example is a bit more complicated, and involves CaresianIndex although it looks like you only keep one element? Here’s how far I got with simplifying, I think this matches yours except for .=:

transforms = cat([CartesianIndices((1:3, 1:3)) for _=1:3]..., dims=3)
filters = rand(3, 3, 3, 3);

function apply_tr(transforms, cs::Int, filters, gs::Int, f_size::Tuple)
    _,  gs, zs = size(transforms)
    @assert size(filters)[3:end] == f_size
    @assert size(transforms,1) == size(filters,1)

    result = similar(filters, (f_size..., gs, cs, size(filters,1), zs))
    for f in 1:size(filters,1)
        for g=1:gs, c=1:cs, z in 1:zs # nothing depends on c

        	perm = circshift(1:gs, z)
            pg = mod(g-z, 1:gs)
            @assert pg == perm[g]

            i = transforms[f, g, z][2] # always picks 2nd element of CaresianIndex
            @assert i isa Int

            rhs = filters[f, g, :, :][i] # linear indexing into a matrix
            @assert rhs isa Number

            result[: ,:,  pg, c,  f, z] .= rhs # same value for all 1,2,4 directions
        end
    end
    result
end

apply_tr(transforms, 3, filters, 3, (3,3)) # this runs!

CTM · July 12, 2020, 5:35pm

Great thank you!

Although this isn’t exactly what I’m looking for (my explanation wasn’t the best sorry about that) in general I can make the small alterations to get the output required.

Just to check, this would then allow the gradient to propagate back to the original filter values?

And if so, is the main adjustment to allow this the use of similar? or the .= rather than = ?

CTM · July 12, 2020, 6:11pm

So I’ve altered some of the code to match your suggestion

function apply_transformation_group_to_group(m_transformation_set, channels_in::Int, filters, in_group_size::Int, filter_size::Tuple)
    no_filters,  in_group_size, transform_group_size = size(m_transformation_set)
    # This will hold the set of filters after group transformations are applied
    filter_result = similar(filters, ( filter_size..., in_group_size, channels_in,  size(filters)[1], transform_group_size))
    for filter in 1:size(filters,1)
        for group=1:in_group_size, channel=1:channels_in, g_filter=1:transform_group_size # for each of the group transformations applied to the group filters
                perm = circshift(1:in_group_size, g_filter)
                i_set = m_transformation_set[filter,group, g_filter][2]  # get the set of transformation indicies for this filter, group, group_action
                rhs = filters[filter, group, :, :][i_set]
                filter_result[: ,:,  perm[group] , channel,  filter, g_filter] .= rhs # apply the transformation through array referencing
        end
    end
    return filter_result 
end

But this still has the issue stating that mutating arrays is not supported? The main diff between the code you suggest and this is that ‘i’ refers not to an int but a 3x3 set of Cartesian Indicies

So to give an example, say we have a filter from filters[f, g, :, :]
defined as →

[1 1 0; 0 2 3; 0 0 9]

The i coming from transforms[f,g,z][2] (using your notation) is a set of co-ordinates to reference the filter

i = [(1,1), (2,2), (1,3); (2,1), (2,3), (1,2); (3,3), (3,2), (3,1)]

so rhs would be someting like: [1 2 0; 0 1 1; 9 0 0]

mcabbott · July 12, 2020, 6:39pm

These two functions agree, and if you put back my @assertions you will see that i is still an integer, and rhs is still a scalar. I suspect that one step closer would be

    ci = transforms[f, g, z]::CartesianIndex
    rhs = filters[f, g, ci]::Number

But the final result is still a 6-D array, whose values are depend on only 3 of the indices, I suspect this isn’t what you intend:

length(unique(res[:,:,1,:,1,1])) == 1

(similar is just a tidier way to make an empty array really. perm seems like an array you need not allocate.)

These loops won’t be differentiable. @tullio writes similar loops, and works out the corresponding derivative. It may struggle with CartesianIndexes, I’m not sure, depending exactly what the desired operation is.

CTM · July 12, 2020, 7:54pm

Adding the assertions back in throws an error, i_set, (before the [2]) for example, is of type Array{CartesianIndex{2},2}

So is the main issue the CartesianIndexing, if I instead just reformulated things to index by integer rather than with CartesianIndex would this help?

And thanks for the heads up on @tullio, I’ll have a read

mcabbott · July 12, 2020, 10:02pm

i_set as defined in the code above is an Int. “Before the [2]” it is what I just called ci. The Array called m_transformation_set isa Array{CartesianIndex{2},3}, with a 3. If you have a 2D array here, then the inputs must be different, and all bets are off.

I think the first issue is making sure you have one formulation in which the computer does what you intend… it’s even less good at guessing than I am.

Topic		Replies	Views
Elegant way to implement a rotating convolution filter without using `rot90` etc General Usage	0	494	July 23, 2021
Mutating array in gradients? New to Julia flux , zygote	4	490	September 8, 2020
Cleaner way for applying conv2 to RGB image General Usage question	0	877	October 14, 2017
`groups` argument in Flux.Conv : a misunderstanding Machine Learning flux	4	693	December 27, 2021
Help writing simple convolution algorithm General Usage array	5	811	September 25, 2018

Group Convolution Implementation - Work-around mutating arrays in gradient calcs

Related topics