Why is my Julia code so slow?



I wrote a simple handwritten digit classifier in julia, for testing purposes, but my code is very very slow compared to other languages (julia 217s, octave 66s, python 6s, c++ 1s). I’m very new to julia so I should have done some mistakes. I’d really appreciate if someone can tell me what’s wrong with my code.

my julia config:

git clone https://github.com/JuliaLang/julia
cd julia
git checkout v0.5.0
julia> Pkg.add("Images")
julia> Pkg.add("ImageMagick")
julia> exit()
alias julia=.....
cd .....
time julia digit_classifier.jl

the code:


Hi, welcome to Julia!

Julia works differently from other languages.
In particular, to get good performance, you need to put your code inside functions, not at “global scope” like you currently have.

This is one of the performance tips listed here.


Thanks for your answer. I tried to implement a simplified problem using functions (or not) but all versions were equally slow. I’ll read the “performance tips” page more deeply when I’ll have some time.

using Images
total_sum = 0
for test_file in readdir("data/nist64/test/")
    test_img = 1 - reshape(load("data/nist64/test/" * test_file), 4096, 1)
    total_sum += sum(test_img)
using Images
function compute()
    total_sum = 0
    for test_file in readdir("data/nist64/test/")
        test_img = 1 - reshape(load("data/nist64/test/" * test_file), 4096, 1)
        total_sum += sum(test_img)
    return total_sum
using Images
function compute_img(test_file)
    return sum(1 - reshape(load("data/nist64/test/" * test_file), 4096, 1))
function compute()
    return mapreduce(compute_img, +, readdir("data/nist64/test/"))


Did you profile? If it is spending all its time in Images.load, then that function will have to be optimized to improve things.

Note that it looks like there is no need for the reshape or the subtraction from 1. You could just do:

img = load(....)
total_sum += length(img) - sum(img)


On my machine, loading a single png takes at least 0.015s. So 20000 images will cost 300s…


Images uses the FileIO package for loading PNG files. On MacOS X, this uses QuartzImageIO, and on other systems it uses the ImageMagick library. Might be faster to call libpng directly, since the other libraries might do extra postprocessing to transform the data into a common format (which Images then needs to translate to its own format)?


The problem is PNG loading indeed. JuliaOpenCV doesn’t help very much but libpng do. Implementing (badly) my own loader using libpng and the julia C interface was very doable but not the funniest thing in my life…
Thanks for your help.

imagemagick: 217s
opencv: 174s
libpng: 3.8s


Maybe you could share the code with us so that the next person doesn’t have to go through that? :wink:


Yes: https://github.com/juliendehos/my_rosetta_code/tree/master/digit_classifier


I needed something similar but converted it to pure Julia code, talking directly to libpng with ccall. This reads all color components but won’t deal very gracefully with certain errors since it doesn’t set png_jmpbuf. If someone knows a sane way to interact with libpng’s longjmp style error handling without a C wrapper, I’m all ears.

Needless to say, this code is nowhere near a general png reader but for specific types of images it can do its job.

# These had better match png.h for your libpng.
const PNG_LIBPNG_VER_STRING = "1.2.50"

function pngread(filename::AbstractString)
    fp = ccall((:fopen, "libc"), Ptr{Void}, (Cstring, Cstring), filename, "rb")
    fp == C_NULL && error("Failed to open $(filename).")
    header = zeros(UInt8, 8)
    header_size = ccall((:fread, "libc"), Csize_t,
                        (Ptr{UInt8}, Cint, Cint, Ptr{Void}),
                        header, 1, 8, fp)
    header_size != 8 && error("Failed to read 8 byte header from $(filename).")

    png_status = ccall((:png_sig_cmp, "libpng"), Cint,
                       (Ptr{UInt8}, Csize_t, Csize_t), header, 0, 8)
    png_status != 0 && error("File $(filename) not identified as png file.")
    png_ptr = ccall((:png_create_read_struct, "libpng"), Ptr{Void},
                    (Cstring, Ptr{Void}, Ptr{Void}, Ptr{Void}),
                    PNG_LIBPNG_VER_STRING, C_NULL, C_NULL, C_NULL)
    png_ptr == C_NULL && error("Failed to create png read struct.")

    info_ptr = ccall((:png_create_info_struct, "libpng"), Ptr{Void},
                     (Ptr{Void},), png_ptr)
    info_ptr == C_NULL && error("Failed to create png info struct.")

    ccall((:png_init_io, "libpng"), Void, (Ptr{Void}, Ptr{Void}),
          png_ptr, fp)

    ccall((:png_set_sig_bytes, "libpng"), Void, (Ptr{Void}, Cint),
          png_ptr, 8)
    ccall((:png_read_png, "libpng"), Void,
          (Ptr{Void}, Ptr{Void}, Cint, Ptr{Void}),
          png_ptr, info_ptr, transforms, C_NULL)

    width = ccall((:png_get_image_width, "libpng"), UInt32,
                  (Ptr{Void}, Ptr{Void}),
                  png_ptr, info_ptr)
    height = ccall((:png_get_image_height, "libpng"), UInt32,
                  (Ptr{Void}, Ptr{Void}),
                  png_ptr, info_ptr)
    channels = ccall((:png_get_channels, "libpng"), UInt8,
                  (Ptr{Void}, Ptr{Void}),
                  png_ptr, info_ptr)

    rows = ccall((:png_get_rows, "libpng"), Ptr{Ptr{UInt8}},
                 (Ptr{Void}, Ptr{Void}), png_ptr, info_ptr)

    image = zeros(UInt8, channels, width, height)
    for i = 1:height
        row = unsafe_load(rows, i)
        for j = 1:width
            for c = 1:channels
                image[c,j,i] = unsafe_load(row, channels * (j - 1) + c)

    png_ptr_ptr = Ref{Ptr{Void}}(png_ptr)
    info_ptr_ptr = Ref{Ptr{Void}}(info_ptr)
    ccall((:png_destroy_read_struct, "libpng"), Void, 
          (Ref{Ptr{Void}}, Ref{Ptr{Void}}, Ptr{Ptr{Void}}),
          png_ptr_ptr, info_ptr_ptr, C_NULL)
    ccall((:fclose, "libc"), Cint, (Ptr{Void},), fp)

    return image


I got something sort-of-working when I looked at this several years ago, but didn’t pursue further because Tim improved ImageMagick/GraphicsMagick wrapper performance. In the linked example, _jmpbuf was a pointer to an array of the correct size based on sizeof(jmpbuf) in C:

It’s not generalizable without a way to get sizeof(jl_jmp_buf) from Julia code. We might be able to add a helper for that to base Julia, but I’m not sure if there would be any issues now interacting with task or thread state (@yuyichao?).


Task/thread interaction shouldn’t be an issue (though the API is certainly single thread). The invisible control flow can cause mis-compilation though.