1.0 annoyances and Matlab comparison


#1

I think Julia is advancing very great, not too slow and not too fast. I imagine that people can wait for some more time for the language to stabilize. Julia is a genuine scientific computing language, not just an extension for doing scientific programming on top of a general-purpose language. As such, the core developers and the community are now doing extensive revisions and testing before version 1.0 is out. I used Julia since version 0.2 and will never quit.

I’ve been promoting the use of the language at my institution for more than a year now and have delivered a number of presentations to professors and engineers. That said, I’m not in a place to judge a great language like Julia, I’m not a computer scientist, but the following are some issues that should be taken as constructive criticism. I may be wrong in many of it, or most of it will go away in a near 1.x version, but these are my own observations and feedback from colleagues. Most of the people at my institution are long-term MATLAB/Fortran users.

  • using PackageX; using PackageY; using …, there are 1734 registered Julia packages at the time, MATLAB covers all science with 86 toolboxes without importing a single toolbox. How can one tell which package to use for which task and if a better alternative package exists for the same task without extensive search in the documentation of each package. Of course, I know the difference between open source projects built by many volunteers and highly paid proprietary software products. Tools that are easy to use, fully functional, and helping the user get the job done with minimal programming knowledge, will win at the end.

  • if a scientific computing language requires the user to write using LinearAlgebra, using Random, using FFTW, using SpecialFunctions, etc., what kind of scientific work can the language do by default?

  • Many small plotting packages with different back-ends none of which is stable enough to work with without problems. Better to focus on one package and write detailed documentation and examples for it.

  • A = Array{Float64}(uninitialized, m, n), what does that “ugly” uninitialized do? Compare plain C double A[m][n]; old Fortran real(8) A(m,n), MATLAB A = zeros(m,n), …, I’m aware of the discussions about this on github, but I feel that Julia is taking care of the developer more than the user in this case.

  • a::AbstractArray + b::Number is deprecated, use broadcast(+, a, b) instead., if one says B = A + 1. I don’t think any user will be happy writing B = broadcast(+, A, 1)

  • @inbounds, @simd, @views, @fastmath, @parallel, … these kinds of syntactic “sugar” add more noise to the code, a release mode vs. a debug mode can be used to enable optimizations like in compiled languages. I know -O3 --check-bounds=no .., but it will be great if the IDE can allow these modes.

  • Some flawed benchmark results published on the Julia website. For example, matrix_statistics, the first test to look at when choosing a scientific language, is essentially testing looping. The test is especially unfair to MATLAB and Fortran, if one chooses a matrix size of 256x256 instead of that illogical 5x5 and looping for 100 times, they will see that MATLAB is actually faster than Julia, and Fortran using the same libraries as Julia for randn and dgemm is way faster (more than 3X, I measured it). recursion_quicksort is never used in MATLAB, MATLAB is bad at recursion. Also, MATLAB’s sort is faster than Julia’s sort!. I’m aware of the philosophy of the benchmarks, which is comparing languages at doing specific code patterns. Still, this is only important from a theoretic point of view, no MATLAB user will ever write such a code.

I love Julia as is, though, and appreciate all the effort that has been done. At least, no other language has been brave enough to face the problem of scientific computing, an area where computations are never fast enough, but if some of these issues and what others have mentioned in this post are solved, it will be a huge boost to the adoption of Julia.


1.0 adoption path?
1.0 adoption path?
#2

I’m not happy about LinearAlgebra. I’m not really happy about Random either but I realize that my belief that Random should be included by default is a result of my specific educational background. The others I think are fair. (I’ve used a lot of special functions in my life, but even I don’t see them as “default” functionality, no pun intended.)

Note however that you still have an awful lot of LinearAlgebra functionality out of the box, since you have all operators (e.g. remember you can also take the adjoint with A' and inner products with dots). Were that not the case I’m sure many of us would have complained very loudly.


#3

I think this is actually recommending B = A .+ 1

A MATLAB user wouldn’t write that code for that specific problem, but there are many problems where writing for loops is the most natural way to solve them, so it’s useful to have a benchmark that demonstrates that you can write fast iterative code in Julia.


#4

That’s probably architecture dependent. If you have an Intel processor, you can get Julia with MKL and should have the same BLAS performance. However, if you have an AMD processor, you cannot (AFAIK) get MATLAB with OpenBLAS.

I like the safe-by-default, but with the choice to opt in to performance.
I think these macros are primarily for package development. In other languages, users very rarely look into the source code. And were they to, these macros are a lot easier to understand than suddenly facing the C++ needed for performance in other languages.

This is more safety by default, taking care of the user. The user shouldn’t be doing A = Array{Float64}(uninitialized, m, n)!

julia> NaN * 0
NaN

Getting random NaNs in your matrix, that may then infect more of your code, is an easy mistake to make. And a frustrating one for new users to diagnose.
For general use, the slow down of creating an array initialized to zero isn’t going to be noticeable at all.
When writing package code, then you can deliberately opt in to the uninitialized version, and make it clear you are doing so. fill(0.0, m,n) is easy enough to write.
I would much rather direct new users there.

Default safety is a lot less likely to scare people away.

EDIT:
I’m also interested in the case of Fortran being faster than Julia. In a few simple test problems, ifort seems to be faster, but gfortran does not.


#5

Yes, and I was missing loops so much when implementing an algorithm on my own in MATLAB. The problem is that all benchmarks seem to focus on this, a 5x5 size matrix can be unrolled by hand if you need the best performance, while scientific code mostly contains huge matrices and n-dimensional arrays that may not fit in memory.

On the other hand, vectorized code can also be very concise and performs well in MATLAB. Compare the following example with a similar performance:

Julia Code:

function fast_counter(x,y)
    n = 0
    @inbounds for i = 1:length(x)
        n += ifelse(x[i]^2 + y[i]^2 > 2y[i]^2, true, false) 
    end
    n
end

n_samples = 10^8 
const x = rand(1:10, n_samples)
const y = rand(1:10, n_samples)

n = fast_counter(x,y)
@time n = fast_counter(x,y)

  0.109765 seconds (85 allocations: 6.451 KiB)

and three lines in MATLAB:

nsamples = 10^8;
x = randi(10,[nsamples,1]);
y = randi(10,[nsamples,1]);
tic, n = sum( x.^2+y.^2 > 2*y.^2 ); toc

  Elapsed time is 0.117222 seconds.

Any way, benchmarking is a very tough task, no one can get it right. But the big advantage of Julia here is that you can write the code in either way, plus the ability to parallelize and use multi-threading to further improve the performance, the thing that is not possible in MATLAB. So, the timing above shows the base Julia performance.


#7

As a (fingers crossed) ex-Matlab user, I can’t even begin to express how much I prefer the simple install-package-when-needed philosophy of Julia, compared to the mind-bogglingly overwhelming decisions you are presented with at install time in Matlab. So I’ll let the following screenshots speak for me instead:

I doubt I need to drive the point home even more, but I still can’t resist mentioning that the installation program doesn’t explain what the toolboxes do. You pretty much have to google them one by one.


#8

Put using LinearAlgebra, Random, FFTW, SpecialFunctions in your ~/.julia/config/startup.jl file (previously ~/.juliarc.jl) and you won’t know the difference. Package developers need to be explicit about these dependencies but users don’t unless they prefer to.

You can do A = zeros(m,n) in Julia too. Matlab does not (as far as I’m aware) provide the ability to do the equivalent of Julia’s A = Array{Float64}(uninitialized, m, n).

As @ssfrr said, the recommendation here is B = A .+ 1. The old behavior of A + 1 in this situation violated the associativity of + in the presence of the very convenient uniform scaling type (λ*I).

There are many scientific problems that involve lots of small matrices. Speed of operations on these problems matter if that’s the kind of problem you have. Julia has the same vectorized performance as Matlab for large matrices if you use the same BLAS (unsurprisingly, since only BLAS speed matters then).


#9

Designed from the ground up to be best of breed scientific/technical computing language by some greedy hackers, yes, but it also has the “bones” to be a top notch general purpose language (which is what I use it for).

The “using” this and that I don’t think is really going to be so much of an issue.
One can simply make a file that has “using …” all of the scientific packages you need, and include that.

There is actually a lot more that could be slimmed out of the language, such as BigInt, BigFloat, Regex, in a way that maybe by default they got loaded up out of stdlib, but via a command-line switch or some other mechanism you could have a very light Julia.


#10

I’ve routinely found ifort to be a factor 2-3 faster compared to gfortran. So comparisons between Julia and Fortran are only fair if the same compiler is used.


#11

The uninitialized semantics seem extremely heavy-handed to me. On the other hand, the presence of the Vector{T}(n) constructors and similar did leave the door wide open for CustomArray(n) constructors with dramatically different behavior, and considering the uniformity of the AbstractArray interface this seems somewhat undesirable, but I can’t think of any really good examples. So, I can see both sides of this. I suspect I’ll have a much more concrete opinion about this after a significant amount of time using 0.7.

Yes! Having a large number of operations on small vectors and matrices is very common in certain fields of physics. It’s not automatically best to assume that the only relevant benchmarks are on the largest possible matrices.


#12

Is it possible the deprecation message could be improved?
Perhaps something like:

Warning: `a::AbstractArray + b::Number` is deprecated, use `a .+ b` instead.

#13

I personally like the uninitialized change. It’s wordy, but if your algorithm produces garbage results, e.g. you start seeing unexpected subnormals, now the first thing you’d do is search for uninitialized in your code and you’d have a good chance of finding the bug.


#14

I found another way to poke around yesterday actually, subtypes(Any) and subtypes(Function) gives ~270 & ~6700 objects respectively, possibly a way to see unexported functions in modules with the repl.
As for regex, i’d imagine it’s pretty handy for text file loading/parsing, might expect include to be using it.

As a sidenote, Wonder if there are many language syntax changes planned for 1.0+? And how to wrap a syntax change? @version0.6 begin ... ?


#15

Re: small matrices: completely agree. Somewhat off-topic, but I’m the author of RigidBodyDynamics.jl, and a very fast rigid body dynamics library pretty much requires fast operations on small matrices. Julia (with StaticArrays.jl) is one of the few languages (and certainly the most productive that I know of) that fulfill this requirement.


#16

This is the first time I’m seeing this, RigidBodyDynamics.jl is very cool. I’m definitely going to show it off to my mechanical engineering friends who use Python and Matlab.

I myself have been looking into using StaticArrays for doing various types of lattice quantum mechanics, although that case tends to be many small vectors or matrices (the fields) stored in a much larger array (the lattice itself) so I’m still not sure that StaticArrays is the right approach (consider this a solicitation for advice). I’m hoping to make a PR to enable 0-based indexing to make it slightly more natural in Minkowski space (see this issue). I also was not aware of Rotations.jl until now. I’m interested in extending this to the entire Poincaire group, and perhaps later to the conformal group, although glancing at Rotations.jl it might make more sense to start from scratch for this, perhaps using Rotations as a guide.

Sorry to get so badly off the topic of this thread :blush: I have a feeling we are going to see “post has been moved to ‘speculation about use of StaticArrays for implementing spacetime transformation groups’” very soon :laughing:


#17

I went looking in the code, and there really isn’t that much (at least that I could find), and some of it
(such as the code to check if a UUID is valid) could be rewritten to be more efficient and not dependent on a Regex.


#18

I have a feeling we are going to see “post has been moved to ‘speculation about use of StaticArrays for implementing spacetime transformation groups’” very soon

Yeah, probably a good idea; I’d do it myself if I could.

I’m definitely going to show it off to my mechanical engineering friends who use Python and Matlab.

Cool! Maybe show them the relatively-new RigidBodySim.jl as well, which combines RigidBodyDynamics with DifferentialEquations (needs no link) and DrakeVisualizer.jl for visualization of the simulation.

Re: StaticArrays for lattice quantum mechanics: I’m stuck in the 1600s and know nearly nothing about quantum mechanics. In terms of software engineering, I iterate over a number of StaticArray-backed types stored in a Vector (of 1-100 elements) a lot, and I think it’s an excellent solution in terms of performance and usability. For much larger vectors/matrices, using BLAS might win for LinAlg.BlasFloat types. Maybe I can give more specific advice with more details about your application.


#19

No worries, the domain knowledge really ins’t necessary for most of the programming considerations. This is already valuable advice to me, your experience using Arrays of StaticArrays in development of a real package is probably a far better indicator of performance reliability than any ad-hoc benchmarking I could do of simple examples. My outer Arrays would typically be much larger but as long as the time complexity of most operations scales linearly with the number of elements I’m hoping it’ll be ok. It does somehow seem to make more computational sense to just have one huge tensor, but that would entail a significant complication in the implementation since in general many of the indices of such an array would transform under different groups. Specifically, the simplest possible case in full lattice gauge theory in a 4 dimensional spacetime would involve a rank-4 array of 4-vectors of 2\times2 complex matrices (so you can see why I am anxious about the performance characteristics of this). Fortunately the operations that would be performed on these objects would be relatively simple in the main program loop, so access is the thing I really worry about.


#20

I’ve used vectors of static arrays and it was very nice. It’s slightly jarring at the beginning to write A[i] instead of A[:, i] but it’s actually simpler in the long run (in most cases you really want to access the inner thing itself rather than its elements). One problem is that it makes it hard to switch easily between representations ; see https://github.com/JuliaArrays/StaticArrays.jl/issues/353 for a possible solution (the other way around would be nice also, ie an array type for which A[i] would be A[:, i])

(sorry to disrupt the topic. The OP has valid points : in some ways Julia looks like it’s moving away from a Matlab replacement to a general purpose language. This is a good thing generally (Matlab being a sucky general programming language is what’s driving people away from it in the first place), but it makes those of us who really like Matlab’s simplicity afraid. Maybe a good workaround is a using SimpleNumerics which, without going full Matlab compatibility, would include relevant packages and possibly provide shortcuts (eg keep zeros/ones/eye if it’s deprecated))


#21

You could also take a look at RecursiveArrayTools:

julia> using RecursiveArrayTools, StaticArrays
INFO: Precompiling module RecursiveArrayTools.

julia> m = [(@SVector randn(5)) for i in 1:100];

julia> rm = VectorOfArray(m)
100-element Array{SVector{5,Float64},1}:
 [2.57562, 0.565917, 0.132054, -0.31207, 0.988647]      
 [-0.53834, 0.638448, -0.727466, 0.192804, 0.225448]    
 [-1.1491, -1.56327, 1.20963, 2.30094, -1.35736]        
 [-1.17971, 0.31592, -0.30782, -0.483111, -1.45646]     
 [1.31586, -0.777962, -2.99292, -0.186956, -0.161788]   
 [-0.744477, -0.0927292, -0.665998, -0.548381, 0.593098]
 [-0.893014, -1.94784, -1.6863, 0.615776, 1.41121]      
 [-0.484544, 1.6242, 0.230385, 0.485394, -1.05647]      
 [-1.13256, -0.113689, 0.798387, 0.107504, 0.794548]    
 [1.84316, -0.295128, -1.30346, -0.0499972, 1.45501]    
 ⋮                                                      
 [2.25566, -1.45649, -0.125953, 0.194074, -0.220085]    
 [0.157373, 1.31999, -0.167246, 0.819113, -1.15088]     
 [2.24811, -0.566307, -0.914905, 0.814942, 0.994176]    
 [2.46972, 1.90351, 0.144164, -0.618719, 1.47789]       
 [-1.51414, 0.861183, 0.345974, 0.79902, 0.865369]      
 [0.255114, -1.43763, 0.710865, 1.25011, -0.0444176]    
 [-2.67157, 0.18663, 0.108644, -0.736388, 0.350107]     
 [-1.62857, -0.378901, 0.751386, -0.621748, 0.109279]   
 [-0.234332, -1.33627, -1.23564, -1.76831, 1.46161]     

julia> rm[2,3]
-1.5632651809902731

julia> rm[:,3]
5-element SVector{5,Float64}:
 -1.1491 
 -1.56327
  1.20963
  2.30094
 -1.35736