Kron vs scalar product speed difference. python code faster?

In v0.6 the vector form for * is deprecated so it will be more obvious though. (I think that v0.6 is a big enough change in this problem domain it is close enough to release that everything should start happening in its language)

You can look at what is happening using expand.

In your version there isn’t any syntax-level broadcast fusion happening, which means that every operator basically creates a new temporary array

julia> expand(:(Delta_W .+= lr * ( x * ehp' - xneg * ehn')'))
:((Base.broadcast!)(+,Delta_W,Delta_W,A_mul_Bc(lr,A_mul_Bc(x,ehp) - A_mul_Bc(xneg,ehn))))

In the optimized version there are only broadcasted operators used. These dots . serve as a syntax sugar for broadcast and in master there is something happening that is called broadcast fusion. basically it merges all the dotted operators/function in one inner function that is broadcasted over the individual arrays. Thus no temporary memory for the inbetween computation needs to be allocated. Take a look:

julia> expand(:(Delta_W .+= lr .* (ehp .* x' .- ehn .* xneg')))
:($(Expr(:thunk, CodeInfo(:(begin 
        $(Expr(:thunk, CodeInfo(:(begin 
        global ##3#4
        const ##3#4
        $(Expr(:composite_type, Symbol("##3#4"), :((Core.svec)()), :((Core.svec)()), :(Core.Function), :((Core.svec)()), false, 0))
        $(Expr(:method, false, :((Core.svec)((Core.svec)(##3#4,Any,Any,Any,Any,Any,Any),(Core.svec)())), CodeInfo(:(begin 
        #temp#@_9 = #temp#@_4 * #temp#@_5
        #temp#@_8 = #temp#@_6 * #temp#@_7
        #temp#@_10 = #temp#@_9 - #temp#@_8
        #temp#@_11 = #temp#@_3 * #temp#@_10
        return #temp#@_2 + #temp#@_11
    end)), false))
        #3 = $(Expr(:new, Symbol("##3#4")))
        SSAValue(0) = #3
        SSAValue(1) = ctranspose(x)
        SSAValue(2) = ctranspose(xneg)
        return (Base.broadcast!)(SSAValue(0),Delta_W,Delta_W,lr,ehp,SSAValue(1),ehn,SSAValue(2))

EDIT: don’t be fooled by the syntax highlighting. there are no comments here. Somehow I am unable to turn of syntax highlighting.

EDIT: Notice the lower two lines that say ctranspose here the new rowvector addition comes into play that was mentioned before