Why is FITSIO slow compared with python astropy?

thevalentino · October 29, 2024, 3:36pm

Hi there, I’m new to Julia and I’m trying to incorporate it to my work.
I am trying to work with a FITS table that is fairly big (1.8 Gb). In astropy it’s easy to read it and load it into a pandas dataframe, it takes ~15 sec on my machine. When I try the same in Julia to create a DataFrame using FITSIO.jl it takes ~160 sec, so a factor 10 slower. I tested it several ways and it’s consistent. There’s another package, EasyFITS that’s equally slow.

Does anybody have an insight as to why the huge difference? I understand the Julia packages are calling the c routines in cfitsio whereas python is not, it’s their own implementation but I don’t know how they could make it so much faster.

Thanks!

aplavin · October 29, 2024, 4:20pm

Hard to say something specific or suggest a better solution without a concrete example one can run…

thevalentino · October 29, 2024, 4:57pm

Sure, I thought it was generally known that it was slower.
Here’s the example I was working with:

This is the file (it’s 1.8 GB): https://data.sdss.org/sas/dr8/common/sdss-spectro/redux/galSpecLine-dr8.fits
On astropy I would do

from astropy.table import Table

df = Table.read("galSpecLine-dr8.fits", format="fits").to_pandas()

That would take 15 sec

In Julia I would do

using FITSIO
using DataFrames

f = FITS("galSpecLine-dr8.fits")
df = DataFrame(f[2])
close(f)

Just the line that creates the DataFrame takes 10x longer than the python code.

Any insight as to why it’s so much slower and if there’s anything one can do to make it faster would be appreciated.

Thanks!

aplavin · October 29, 2024, 9:26pm

Seems like there’s no significant Julia overhead here, cfitsio ccalls take almost the whole time indeed (did @profview to see that). MWE is

f = FITS("galSpecLine-dr8.fits")
@time read(f[2], "MJD")

takes about 0.35 seconds for me. Given that there are 241 columns, it’s 85 seconds already.

@barrettp may know more.

barrettp · November 2, 2024, 7:04pm

I don’t know cfitsio that well. I do know that when implementing PyFITS, now astropy.io.fits, about 25 years ago that each header was read when the file was opened so that PyFITS created the list of HDUs in memory. PyFITS used lazy reading of the data to improve performance. This approach made it easy to jump directly to the HDU containing the desired data. It would appear that cfitsio does not do this and it has to read each HDU to find the correct one before reading the data.

Topic		Replies	Views
Questions re Using Julia General Usage images	28	1681	May 4, 2020
Is python pandas faster than julia CSV? General Usage csv	3	946	June 28, 2020
CSV read in is too slow than other language General Usage performance	13	1358	June 21, 2023
Extremely slow CSV / IO? New to Julia csv , io	3	525	January 5, 2022
Comparison between time taken in Python and Julia Machine Learning	2	1352	August 29, 2019

Why is FITSIO slow compared with python astropy?

Related topics