Flux vs pytorch cpu performance

ToucheSir · July 20, 2020, 4:34pm

Based on @ChrisRackauckas’s comment here, fusing the matrix multiplication + add in Dense should allow the existing vectorized tanh broadcast to kick in. It seems like IntelVectorMath.jl also accomplishes this, but AIUI using fused mul! should also remove the intermediate allocation incurred when adding the bias vector (this is what PyTorch does).

Topic		Replies	Views
Flux multi-cpu parallelism? New to Julia question , flux , zygote	9	3016	November 21, 2020
Flux running slow? Machine Learning	16	2869	August 19, 2021
Slow LSTM on GPU in Flux Machine Learning gpu , flux , pytorch	21	2253	February 15, 2024
Speed Comparison Python v Julia for custom layers Machine Learning question , flux , python , machine-learning , speed-optimization	25	1576	July 28, 2023
Flux.jl RNN performance Machine Learning	11	3209	October 28, 2018

Flux vs pytorch cpu performance

Related topics