Questions re Using Julia

Hi everyone
Please be gentle, this is my first post. I have posted it here because it spans a number of areas.
First, I have written a program in python that loads a series of images, analyses them and identifies potential comets. It works ok although I am always refining it.
The images are loaded into numpy arrays and then I use numpy and cupy to do the analysis. It is not too slow but I am always looking for ways to speed up the program.

I saw mention that Julia runs faster than python and am interested in seeing if I could convert my program to run on Julia. However I need to know whether a program like mine would in fact run faster and if so by how much. I would also need to know whether there are libraries that can perform discrete tasks.


  1. Am I likely to see a speed increase by using Julia instead of Python for my application? if so, how much speed increase would I see?

  2. Would I be better off programming in Golang?

  3. Can Julia load image files into arrays? Are there libraries for this?

  4. Could a FITS file be loaded into an array?

  5. Are there libraries to save an array as an image?

Thanks Peter

Hello Peter. Lots of people here will help you.
Please start by looking at Julia Images

Documentation is here:

Regarding the FITS format the Julia Astro organisation is here

1 Like

Is your code available in a repository?
Are you able to say something about the ML approach you are using to recognise comets?
I guess you are still conducting your research so you may not be able to do this.

Hi and thanks

My code is strictly for personal purposes and is used to find comets as part of the Sungrazer project run by NASA where I participate as a Citizen Scientist (I prefer the term Amateur myself).

It took me a year to find my first comet, not because they are hard to find (its actually quite easy) but because the participants are very very competitive.

Around that time I had finished doing an online Python course at MIT and decided to write a program to find comets in SOHO telescope archival images. After 6 years (I wasn’t in a hurry) I started using my program and have been refining it since. Since early 2019 I have found another 27 confirmed comets which puts me at #2 in Australian comet discoverers as far as I know.

In light of the competitive nature of comet hunting I don’t want to release the program to the public as I would lose any advantage I have accrued. I am happy to provide a high level description of how it works though.

The program looks across each sequence of 4 images and looks for a bright object that is to a certain degree brighter than its immediate neighborhood, moving in a straight line and at a constant speed taking into account when the images were taken. The comet sequences of 4 images are then joined together where they are a continuation of a previous sequence. That way I can find a sequence of up to say 30 images in a row where a comet is moving through each image.

That’s basically it.

Anyone able to comment on the speed of Python using Numpy vs Julia?

cheers Peter


If all you do is use numpy, your speed advantage will be slight when doing
the same thing in Julia.
But if you ever need to branch into python code for any substantial
computing, switching to Julia will payoff in the long run.

Well, most of what I do is in Numpy so maybe Julia isn’t so great for my purposes.

is there another language that would give a significant speed advantage over Python Numpy?

cheers Peter

I think the question is do you do any for loops where you iterate over a piece of the image? If so then julia will be (much) faster.

If all you do are matrix operations then you are already close to max speed.


Yes I moved to matrix operations some time ago to get more speed.

cheers Peter

If your problem is inherently “vectorized”, I agree that you would probably not gain too much using Julia (however, good Julia code is often still a bit faster than Numpy).
On the other hand, if you had to somehow make “more complicated” calculations or more memory allocation to be able to express your problem as vector operation (which is usually worth it in Python), you may gain more by switching to Julia.
This talk I really found informative:

Btw. you can easily combine Julia and Python code using PyCall / pyJulia, e.g. you could just replace the most performance critical part of your Python code by Julia and call it directly from Python.


The point is also that even if you won’t see any significant speed gain when transforming your code from Python to Julia because you mostly do the heavy work in vectorised numpy operations, you still gain a lot flexibility in the sense that you can fine tune your code much more easily compared to the “whole-meal programming” in numpy.

Btw. I also saw a lot of code based on numpy written very inefficiently because people tend to think that it’s (basic usage) all they can get. Most of time the inefficiency comes from multiple (temporary) allocations which can easily be avoided in Julia.


In [1]: import numpy as np

In [2]: a = np.random.rand(1_000_000)

In [3]: b = np.random.rand(1_000_000)

In [4]: c = np.random.rand(1_000_000)

In [5]: %timeit a * b - c
5.03 ms ± 449 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

whereas in Julia

julia> using BenchmarkTools

julia> a = rand(1_000_000);

julia> b = rand(1_000_000);

julia> c = rand(1_000_000);

julia> @btime $a .* $b .- $c;
  2.873 ms (2 allocations: 7.63 MiB)

The reason for this is: numpy has no idea that after a * b, you are going to do - c, so it cannot optimise and creates a temporary array.

You can avoid this by using e.g. the out= keyword, which will reuse an array instead of allocating temporary ones or an additional library like numexpr:

In [6]: import numexpr as ne

In [7]: %timeit ne.evaluate("a * b - c")
3.43 ms ± 41.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So be aware, using numpy gives you access to nice vectorised functions but you also have to take care of allocations and other things. Once you need to customise the loop behaviour, you are kind of lost and have to utilise something like numexpr or numba. In such cases however, I’d definitely consider switching the language.

  1. Probably the best way to tell is to use something like pyinstrument to see how much time your code spends outside of numpy routines. That number is basically your theoretical maximum for speedup from a simple mechanical translation from python to julia (this may not be entirely true; for example, julia may save you allocations due to broadcast-fusion for poorly-written numpy code, so it’s possible you’ll save some of the time spent inside numpy routines as well).
  2. Almost certainly not; I don’t believe golang is in general well-tuned for these kind of numerical workloads.
  3. Yes. See previous answers about the Images.jl ecosystem.
  4. See previous answers.
  5. Yes. See previous answers about the Images.jl ecosystem.

One problem I keep running into with Python is memory errors. if
the number of “stars”
(potential comets) found is very large in a sequence of images
then the size if the arrays balloons out and I get memory errors.

  Is Julia better in this area and if so, why?

cheers Peter

As we wrote above, your numpy operations may be memory inefficient due to unnecessary temporary allocations (typical numpy issue on chained operations). However, we cannot tell much about your situation without seeing the actual code. It always depends on the use case.

I summarise again what we wrote above: Julia is in general “better”, since in contrast to numpy, it does not rely on vectorised operations to be fast. In Julia you can write your own loops and low level algorithms where you have full control of the allocations and it will always be fast. In fact, even chained vector operations will most of the time be compiled to (memory) efficient code by LLVM. So yes, Julia is better compared to Python+numpy in many cases for several reasons.

1 Like

No worries


Matrix operations for image calculations like stencils are inherently pretty slow compared to some more optimized looping strategies. Basically, a sparse matrix (like a stencil) is a hammer that is okay, but not great. Usually there’s a better algorithm that can exploit the structure more directly. In Julia it’s much easier to directly write these algorithms with loops. That said, without seeing your code, it’s hard to know details.


Below is the code for the main engine of my program. This is where most of the time is taken up and where most memory problems occur.

To explain, my program looks at a sequence of say 120 images and initially identifies “stars” ie positions in each image that are brighter than their surrounds. These positions are put into lists, one for each image.

So the following subroutine takes 4 of these image lists in a row (not necessarily equally spaced in time). It makes an array out of the 1st and 4 lists, in other words it looks at every combination of stars in 1st image and stars in the 4th image and these combinations are the array. It then makes 2 other arrays where for each combination it calculates where theoretically a star should appear at the time of the 2nd image and the time of the 3rd image. It then compares those theoretically arrays with identified stars for the 2nd and 3rd images and returns true if there are “stars” at those positions in reality.

If the subroutine finds 4 stars in a straight line and moving at a constant speed (after taking into account the timing of the images) the the result is stored as a “sequence”.

I have used cupy in this which is the same as numpy but with the calculations done by the graphics card.

No doubt there are probably more efficient ways to do this but I was able to get decent speed boosts from vectorisation.

Two points to note.

  1. The code if flexible so when comparing whether there is a real star where a theorised one is to be foudn I allow +/- 1 pixel>

  2. When doing comparisons I divide the y positions value by 1000 and add to the x value so that I only need to compare one number not two.

cheers Peter

def seqfind(number,starttime,file1,file2,file3,file4,time1,time2,time3,time4,allsequence):


if l1==0 or l2==0 or l3==0 or l4 ==0:
    return allsequence

perc= (time2-time1)/(time4-time1)
perc2= (time3-time1)/(time4-time1)


first = np.zeros((l4,l1,2))
fourth = np.zeros((l1,l4,2))

firstc= cp.asarray(first)
fourthc = cp.asarray(fourth)
secondc =secondc.astype(int)
thirdc =thirdc.astype(int)




secondsplit = cp.asnumpy(secondsplitc)
secondsplit1 = cp.asnumpy(secondsplit1c)
secondsplit2 = cp.asnumpy(secondsplit2c)
secondsplit3 = cp.asnumpy(secondsplit3c)
secondsplit4 = cp.asnumpy(secondsplit4c)
secondsplit5 = cp.asnumpy(secondsplit5c)
secondsplit6 = cp.asnumpy(secondsplit6c)
secondsplit7 = cp.asnumpy(secondsplit7c)
secondsplit8 = cp.asnumpy(secondsplit8c)






thirdsplit = cp.asnumpy(thirdsplitc)
thirdsplit1 = cp.asnumpy(thirdsplit1c)
thirdsplit2 = cp.asnumpy(thirdsplit2c)
thirdsplit3 = cp.asnumpy(thirdsplit3c)
thirdsplit4 = cp.asnumpy(thirdsplit4c)
thirdsplit5 = cp.asnumpy(thirdsplit5c)
thirdsplit6 = cp.asnumpy(thirdsplit6c)
thirdsplit7 = cp.asnumpy(thirdsplit7c)
thirdsplit8 = cp.asnumpy(thirdsplit8c)






second = cp.asnumpy(secondc)
third = cp.asnumpy(thirdc)

return allsequence

At a high level, it seems inefficient to build up a huge list of every possible correspondence between stars in the 1st image and stars in the 4th image, only to go and throw away almost all of those correspondences by cross-checking them with the 2nd and 3rd images. I might try something like:

  • For each star in the 1st image:
    • For each possible correspondence in the 4th image:
      • Check if there is a match in the 2nd and 3rd images.
      • If so, add to the list of stars

This would avoid needing to build up a huge collection of mostly incorrect correspondences and should help avoid running out of memory.

In general, Julia is a great tool for doing exactly the kind of exploration I’m suggesting here. If your algorithm makes sense to describe as a loop (or a nested loop, as I’ve written above), then Julia will let you write that loop in a straightforward way without having to “vectorize” it, and the resulting code can be extremely fast.

Here’s a sketch of how I might write this. This code won’t run on its own, but hopefully it gives you a sense of how you might implement the algorithm I’m suggesting in Julia:

# An Image here represents a sample time and a collection of bright spots.
struct Image
  data::Matrix{Float64}  # The raw brightness data
  bright_spots::Vector{Tuple{Int, Int}}  # The pre-computed list of "bright" spots

function star_appears_in_all_images(start_coordinates, end_coordinates, images)
  # TODO: write this function, which takes the start and end coordinates and determines if
  # there is a matching bright spot in each image
  # This should look something like:
  for i in eachindex(images)
    expected_coordinates = # where do you expect to see a bright spot in image i ?
    if !is_sufficiently_bright(images[i], expected_coordinates)
      # if you expected a bright spot, but the image doesn't have one, then this isn't a star trail
      return false
  # If we checked every image and didn't see any that were missing a bright spot, then this
  # might be a real star trail!
  return true

# This structure will hold a single candidate "star" represented by its coordinates in the
# first and last image
struct StarTrail
  start_coordinates::Tuple{Int, Int}
  end_coordinates::Tuple{Int, Int}

function find_stars(images::Vector{Image})
  # Store a vector of StarTrails to hold the results
  stars = Vector{StarTrail}()

  for start_coordinates in first(images).bright_spots
    for end_coordinates in last(images).bright_spots
      if star_appears_in_all_images(start_coordinates, end_coordinates, images)
        push!(stars, StarTrail(start_coordinates, end_coordinates))

  return stars

This is pretty different looking from your numpy code. Using loops and structured data can make the algorithm a lot more obvious, and in Julia it can also make your code even faster. It also can make writing more generic code easier: For example, the method I’m proposing doesn’t actually need to use exactly 4 images–it should work just fine if you want to try with 3 or 4 or 5 or any other number of images.


Thanks but I do not understand this.

I initially used FOR loops to loop through every combination of 1st and 4th stars but that was too slow.

I found vectorising those combinations into an array using Numpy was faster than using for loops.

One way or another every combination of star in 1st image and star in 4th image needs to be tested.

Isn’t vectorisation (using array to process many calculations) preferable to using FOR loops in Julia as well as in Python?

Also, when you say list of stars I think you mean list of sequences of 4 stars in a row.

cheers Peter

No, definitely not. Loops in Julia are orders of magnitude faster than in Python, and they can be faster than vectorized code. There is no need to vectorize code in Julia for performance.

1 Like

That’s because Python loops are slow, not because vectorization is fast. Vectorization is really just a hack to do a big for loop in a function that is written in a faster language (C or Fortran), but if you were working in said faster language (C, Fortran, or Julia) then writing the loop is just as fast as vectorization. Then when that’s the case, the fastest thing to do is to avoid vectorization because many of the vectorization strategies are not memory or algorithmically efficient as @rdeits points out in your example.