I have implemented some mathematical models in Python that require basic operation like scalar multiplication, summing, raising power, etc. of arrays (upto 3d) and scalars.

I find that my Julia code is substantially slower than my numpy code:

As an example:

Python(3.5)/Numpy:

```
import numpy as np
a=np.random.rand(3,3,50)
b=np.random.rand(3,3)
%timeit for i in range(a.shape[-1]):(5*a[0,:,i]+a[1,:,i]/2.0+4.0)
1000 loops, best of 3: 194 µs per loop
```

Julia:

```
using TimeIt
a=rand(3,3,50)
b=rand(3,3)
function test();@timeit for i in (1:size(a,3));(5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0);end;end;test();test();
100 loops, best of 3: 1.21 ms per loop
100 loops, best of 3: 1.12 ms per loop
```

So Julia is about 6 times slower.

I also often need to raise Float64 or Arrays{Float64} to a power:

Numpy:

```
%timeit for i in range(a.shape[-1]):(5*a[0,:,i]+a[1,:,i]/2.0+4.0)**3.3
1000 loops, best of 3: 263 µs per loop
```

Julia:

```
function test();@timeit for i in (1:size(a,3));(5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0).^3.3;end;end;test();test();
100 loops, best of 3: 2.07 ms per loop
100 loops, best of 3: 1.81 ms per loop
```

Now Julia is almost 7 times slower.

Even using custom power function as defined below doesn’t help:

```
pw(a,n) = @. exp(n*log(a))
function test();@timeit for i in (1:size(a,3));pw((5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0),3.3);end;end;test();test();
100 loops, best of 3: 1.95 ms per loop
100 loops, best of 3: 1.82 ms per loop
```

Defining a more complicated function like below does:

```
function pw2(x::Union{Array{Int64},Array{Float64}},n::Union{Int64,Float64})
if (floor(n) - n) .!= 0.0
@. exp(n*log(x))
else
n = convert(Int64,n)
if n == 0
return (ndims(x) == 0) ? 1.0 : ones(size(x))
else
xp = copy(x)
for i in 1:(abs(n)-1)
xp .*= x
end
return (n>0) ? xp : 1./xp
end
end
end
function test();@timeit for i in (1:size(a,3));pw2((5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0),3.3);end;end;test();test();
100 loops, best of 3: 1.32 ms per loop
100 loops, best of 3: 1.18 ms per loop
```

Although this is better, this is still 4.5 times slower than a simple numpy implementation. I could probably write functions like this for addition, subtraction, etc., but when I also want to make these methods general so that they can take `n::Array{Float64}`

I’m suddenly faced with a whole lot of extra method definitions for what seem like basic mathematical operations. Do I really have to write some sort of basic library for these basic operations to be able to make use of Julia’s speed? Writing all my functions for my project using such basic loops is not an option as this would the code completely unreadable and complex. Writing them for the operators might be, but I find it strange that this wouldn’t have been done yet in Julia. So I’m probably missing something here. Can anyone point me in the right direction? Thanks