Smoothing probability distribution output of network

taotree · December 22, 2023, 1:40am

I’m training a neural network with flux and outputting a probability distribution (ie. probability bins for the value of a continuous variable). The output layer is outputting the values for the bins which I then normalize (sum to one, or softmax). That much is working. Now, I want to smooth it before calculating the loss during training, which happens on the gpu (so avoid scalar indexing). Are there common strategies for smoothing like that?

In particular, I want it to “fill in the 0’s” effectively. So, for a distribution like [0,1,0,3,0,5], it would smooth to something resembling [0,1,2,3,4,5] (not exactly, just trying to illustrate the “fill in the 0’s” part).

I thought maybe a convolution with a flat vector (ie. [.2,.2,.2,.2,.2]) would do that. The convolution in Flux appears to be image specific (the doc seems to assume multiple dimensions). When I tried the conv function from DSP.jl, it appears to be incompatible with Zygote:

Compiling Tuple{typeof(lock), CUDA.APIUtils.var"#10#13"{CUDA.APIUtils.var"#9#12", CUDA.APIUtils.HandleCache{Tuple{CUDA.CuContext, CUDA.CUFFT.cufftType_t, Tuple{Vararg{Int64, N}} where N, Any}, Int32}, Tuple{CUDA.CuContext, CUDA.CUFFT.cufftType_t, Tuple{Int64, Int64}, Tuple{Int64, Int64}}}, ReentrantLock}: try/catch is not supported.

Any suggestions?

Tomas_Pevny · December 22, 2023, 4:15am

Would temp_softmax work? Like the one used in transformers? Transformers.jl/example/GPT2_TextGeneration/text_generation.jl at master · chengchingwen/Transformers.jl · GitHub On,line 10

ToucheSir · December 22, 2023, 4:37am

I would second @Tomas_Pevny, but for future reference here are some details on how you’d do the convolution approach.

It is not. If you pass a 1D kernel shape (or a 2D kernel matrix of length x features), that would work. You may have to reshape your input array to the conv from features x batch -> features x 1 x batch (and then back from features x 1 x batch -> features x batch after the conv, disguising the feature dim as the spatial/time dim), but that operation should have basically no overhead.

For an approach using FFTs, using the GitHub - JuliaMath/AbstractFFTs.jl: A Julia framework for implementing FFTs may work because it has AD rules defined. CUDA.jl also implements this interface for CuArrays.

taotree · December 22, 2023, 4:17pm

temp_softmax doesn’t seem to be what I’m looking for. It takes the example [0,1,0,3,0,5] and gives:

 0.012197566980823719
 0.028066307550425766
 0.012197566980823719
 0.14859678607916108
 0.012197566980823719
 0.7867442054279419

which is just as “bumpy” as the original vector. Perhaps I should have worded it “interpolate across the zeros”. Sort of like considering the zeros to be missing data rather than a lower value for that bin.

@ToucheSir thank you for the reshaping suggestion. That works:

    sz = size(yhat)
    yhat_smoothed = reshape(NNlib.conv(reshape(yhat, sz[1], 1, sz[2]), reshape(KERNEL, 5, 1, 1); pad=2), sz...)

It’s not perfect. Just one time over still leaves it somewhat bumpy, but it’s an improvement. Also, there is the problem at the edges. I had to put the pad argument in there in order to maintain the same size of array, but that means the edges fall off. conv doesn’t seem to adjust for the fewer entries at the ends.

Topic		Replies	Views
Why the reshape in Flux mnist convolution example Machine Learning question	1	1054	August 25, 2018
Convolving more than two distributions General Usage	3	861	February 2, 2017
How to make a Custom Layer work on GPU? Machine Learning question , flux	3	800	September 2, 2020
Upsampling in Flux.jl Machine Learning flux	7	2744	November 3, 2019
Different behaviour between Flux.jl and Pytorch Machine Learning machine-learning	17	2331	February 13, 2021

Smoothing probability distribution output of network

Related topics