Smoothing probability distribution output of network

I’m training a neural network with flux and outputting a probability distribution (ie. probability bins for the value of a continuous variable). The output layer is outputting the values for the bins which I then normalize (sum to one, or softmax). That much is working. Now, I want to smooth it before calculating the loss during training, which happens on the gpu (so avoid scalar indexing). Are there common strategies for smoothing like that?

In particular, I want it to “fill in the 0’s” effectively. So, for a distribution like [0,1,0,3,0,5], it would smooth to something resembling [0,1,2,3,4,5] (not exactly, just trying to illustrate the “fill in the 0’s” part).

I thought maybe a convolution with a flat vector (ie. [.2,.2,.2,.2,.2]) would do that. The convolution in Flux appears to be image specific (the doc seems to assume multiple dimensions). When I tried the conv function from DSP.jl, it appears to be incompatible with Zygote:

Compiling Tuple{typeof(lock), CUDA.APIUtils.var"#10#13"{CUDA.APIUtils.var"#9#12", CUDA.APIUtils.HandleCache{Tuple{CUDA.CuContext, CUDA.CUFFT.cufftType_t, Tuple{Vararg{Int64, N}} where N, Any}, Int32}, Tuple{CUDA.CuContext, CUDA.CUFFT.cufftType_t, Tuple{Int64, Int64}, Tuple{Int64, Int64}}}, ReentrantLock}: try/catch is not supported.

Any suggestions?

Would temp_softmax work? Like the one used in transformers? Transformers.jl/example/GPT2_TextGeneration/text_generation.jl at master · chengchingwen/Transformers.jl · GitHub On,line 10

1 Like

I would second @Tomas_Pevny, but for future reference here are some details on how you’d do the convolution approach.

It is not. If you pass a 1D kernel shape (or a 2D kernel matrix of length x features), that would work. You may have to reshape your input array to the conv from features x batch -> features x 1 x batch (and then back from features x 1 x batch -> features x batch after the conv, disguising the feature dim as the spatial/time dim), but that operation should have basically no overhead.

For an approach using FFTs, using the GitHub - JuliaMath/AbstractFFTs.jl: A Julia framework for implementing FFTs may work because it has AD rules defined. CUDA.jl also implements this interface for CuArrays.

1 Like

temp_softmax doesn’t seem to be what I’m looking for. It takes the example [0,1,0,3,0,5] and gives:

 0.012197566980823719
 0.028066307550425766
 0.012197566980823719
 0.14859678607916108
 0.012197566980823719
 0.7867442054279419

which is just as “bumpy” as the original vector. Perhaps I should have worded it “interpolate across the zeros”. Sort of like considering the zeros to be missing data rather than a lower value for that bin.

@ToucheSir thank you for the reshaping suggestion. That works:

    sz = size(yhat)
    yhat_smoothed = reshape(NNlib.conv(reshape(yhat, sz[1], 1, sz[2]), reshape(KERNEL, 5, 1, 1); pad=2), sz...)

It’s not perfect. Just one time over still leaves it somewhat bumpy, but it’s an improvement. Also, there is the problem at the edges. I had to put the pad argument in there in order to maintain the same size of array, but that means the edges fall off. conv doesn’t seem to adjust for the fewer entries at the ends.

1 Like