Strange performance issue with using parsed commandline parameters in calculation

Hi. I am running into a strange performance issue when using parameters obtained from ArgParse.jl (also noted this in Performance issue when using parsed_args in computation. · Issue #139 · carlobaldassi/ArgParse.jl · GitHub). If I read in a value from commandline and use it for a calculation (at least for the calculations I tested), I get a huge performance hit. If instead I just hardcode/assign the values without parsing, there is no performance issue. This was tested on linux and windows with julia 1.10 and 1.12. Below is a simplified snippet of code to reproduce the issue:

using ArgParse

function parse_cmd()
    s = ArgParseSettings()
    @add_arg_table s begin
        "--N"
            arg_type = Int64
            default = 2
        "--w"
            arg_type = Float64
            default = 0.1
    end
    return parse_args(s)
end

function run_parsed!()
    parsed_args = parse_cmd()

    N = parsed_args["N"]
    w = parsed_args["w"]
    println("parsed test")
    println("val $(N) and type $(typeof(N))")
    println("val $(w) and type $(typeof(w))")

    val = 0.0
    @time for step = 1 : 100000000
        val += sqrt(N * w * sin(N * w)^2)
    end
end
function run_hardcoded!()
    parsed_args = parse_cmd()

    N = 2
    w = 0.1
    println("hardcoded test")
    println("val $(N) and type $(typeof(N))")
    println("val $(w) and type $(typeof(w))")

    val = 0.0
    @time for step = 1 : 100000000
        val += sqrt(N * w * sin(N * w)^2)
    end
end
run_parsed!()
run_hardcoded!()

This is the output

Note: for some reason this issue doesn’t arise on a cluster running julia 1.10, just three different laptops.

I don’t think the types of NN and WW can be inferred. You can use a function barrier to solve the resulting performance issue (i.e. put the calculation in a function that has NN and WW as arguments). That should solve the problem.

2 Likes

I think you can detect this in cases like this e.g. using @code_warntype or some similar trick. But I’m not near a pc and haven’t used these packages before, so I’m just going by Julia intuition here!

Ah thanks, that appears to be the issue, as least part of it. I feel kind of dumb for not realizing that. If I do the following it works (actually I don’t need dosomething()).

function run_parsed!()
    parsed_args = parse_cmd()

    N::Int64 = parsed_args["N"]
    w::Float64 = parsed_args["w"]
    dosomething(N, w)
end
function dosomething(N::Int64, w::Float64)
    println("parsed test")
    println("val $(N) and type $(typeof(N))")
    println("val $(w) and type $(typeof(w))")

    val = 0.0
    @time for step = 1 : 100000000
        val += sqrt(N * w * sin(N * w)^2)
    end
end

I get a very small but nonzero performance issue (related to the function call somehow).

1 Like

Not entirely the call. The compiler just eliminated the val calculation (0.000000 seconds) in the hardcoded method because it can be inferred to be dead code. Try returning val from all of these methods and observe the timings.

This is what I have now. The parsed one still performs slightly worse (note I get the same slightly worse performance if I perform the calculation in a separate function then return twice).

using ArgParse

function parse_cmd()
    s = ArgParseSettings()
    @add_arg_table s begin
        "--N"
            arg_type = Int64
            default = 2
        "--w"
            arg_type = Float64
            default = 0.1
    end
    return parse_args(s)
end

function run_parsed!()
    parsed_args = parse_cmd()

    N::Int64 = parsed_args["N"]
    w::Float64 = parsed_args["w"]
    println("parsed test")
    println("val $(N) and type $(typeof(N))")
    println("val $(w) and type $(typeof(w))")

    val = 0.0
    @time for step = 1 : 100000000
        val += sqrt(N * w * sin(N * w)^2)
    end

    return val
end

function run_hardcoded!()
    parsed_args = parse_cmd()

    N = 2
    w = 0.1
    println("hardcoded test")
    println("val $(N) and type $(typeof(N))")
    println("val $(w) and type $(typeof(w))")

    val = 0.0
    @time for step = 1 : 100000000
        val += sqrt(N * w * sin(N * w)^2)
    end

    return val
end
run_parsed!()
run_hardcoded!()

Edit: If I perform the for loop in a separate function for both cases, then the performance is the same for both (~0.25s).

Oh I see now with @code_warntype. In the hardcoded case, N and w are stored as Const() while in the parsed case they aren’t. That is the difference between the two now. Thanks everyone!