Hi there, I’m new to Julia and I’m trying to incorporate it to my work.
I am trying to work with a FITS table that is fairly big (1.8 Gb). In astropy it’s easy to read it and load it into a pandas dataframe, it takes ~15 sec on my machine. When I try the same in Julia to create a DataFrame using FITSIO.jl it takes ~160 sec, so a factor 10 slower. I tested it several ways and it’s consistent. There’s another package, EasyFITS that’s equally slow.
Does anybody have an insight as to why the huge difference? I understand the Julia packages are calling the c routines in cfitsio whereas python is not, it’s their own implementation but I don’t know how they could make it so much faster.
Thanks!
Hard to say something specific or suggest a better solution without a concrete example one can run…
Sure, I thought it was generally known that it was slower.
Here’s the example I was working with:
from astropy.table import Table
df = Table.read("galSpecLine-dr8.fits", format="fits").to_pandas()
That would take 15 sec
using FITSIO
using DataFrames
f = FITS("galSpecLine-dr8.fits")
df = DataFrame(f[2])
close(f)
Just the line that creates the DataFrame takes 10x longer than the python code.
Any insight as to why it’s so much slower and if there’s anything one can do to make it faster would be appreciated.
Thanks!
1 Like
Seems like there’s no significant Julia overhead here, cfitsio ccalls take almost the whole time indeed (did @profview
to see that). MWE is
f = FITS("galSpecLine-dr8.fits")
@time read(f[2], "MJD")
takes about 0.35 seconds for me. Given that there are 241 columns, it’s 85 seconds already.
@barrettp may know more.
1 Like
I don’t know cfitsio that well. I do know that when implementing PyFITS, now astropy.io.fits, about 25 years ago that each header was read when the file was opened so that PyFITS created the list of HDUs in memory. PyFITS used lazy reading of the data to improve performance. This approach made it easy to jump directly to the HDU containing the desired data. It would appear that cfitsio does not do this and it has to read each HDU to find the correct one before reading the data.