Understanding @time memory allocations

qt_codes · April 5, 2022, 11:26pm

I am timing a function in my code am confused as to why my memory allocations while compiling go up when I make the input smaller. The function is called twice, so I can look at pre and post compiling.

@time begin
    mystructpush!(storage_array, datastruct)
end

(the function in question)

function mystructpush!(array, newstruct::DevDataID)
    push!(array, DevDataID(newstruct))
end

I have a few versions of this function for different struct types, but I’m focusing on this case now. Here is the DevDataID Struct

@with_kw mutable struct DevDataID
    dev_id::String = "no!"
    app_id::String = "no!"
    app_version::Int = 0
    index::Int = 0
    man_id::Int = 0

end

With the code with this struct definition, I get this timing result

I understand that the function has to compile the first time, and that account for most of the allocations. The second run I believe I’m just seeing the bytes necessary to copy the struct.

Now, I was curious if I could make my code more efficient by changing the data types in DevDataID. See the new version below:

@with_kw mutable struct DevDataID
    dev_id::String = "no!"
    app_id::String = "no!"
    app_version::UInt8 = 0
    index::UInt8 = 0
    man_id::UInt16 = 0
end

I know the data in app_version, index, and man_id will fit within these ranges, so this shouldn’t cause any issues for me and reduces the struct size. Now, I run my code again, and here are the new results

Why did my allocations during the compiling almost double when the total size went down 16 bytes? Any insight here would be appreciated as this is very strange behaviour to me.

edit: I’m also curious why the allocations are so high for this process after it has compiled. The struct is 24 bytes big when it gets pushed.

lawless-m · April 6, 2022, 8:33am

You should post the actual contents of decoder.jl

qt_codes · April 6, 2022, 3:12pm

Okay, below is my code. I tried to trim it down to a minimum necessary to see the flow.

I’m reading in chunks of data from a txt file, filling my DevDataID struct one field at a time as I read, then copying it to an array for storage/later reference once I get to the end of each chunk.

A chunk in the txt file looks like this

DevDataID: 6
developer_id: 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 
application_id: 102, 10, 88, 30, 83, 1, 70, 12, 143, 47, 3, 76, 139, 109, 201, 15, 
app_version: 95
manufacturer_id: 65535
dev_data_index: 0

The code below runs has the same behaviour as what I posted above. I have also been able to replicate this kind of behaviour (large allocations when pushing a struct even when using push! alone) in the REPL.

using Parameters

# STRUCTS!

@with_kw mutable struct DevDataID
    dev_id::String = "no!"
    app_id::String = "no!"
    app_version::UInt8 = 0
    index::UInt8 = 0
    man_id::UInt16 = 0
end

@with_kw mutable struct FieldDef
    name::String = "no!"
    units::String = "no!"
end

@with_kw mutable struct DeviceData
    device_index::UInt8 = 0
    ant_transmission_type::UInt8 = 0
    ant_network::UInt8 = 0
    source_type::UInt8 = 0
    sw_version::Int = 0
end

@with_kw mutable struct ScanFlag
    linestogo::UInt8 = 0
    case::String = ""
end

# HELPER FUNCTIONS

function extractleft(input::String, delim::Char)
    # returns the title section of a str w/ format
    # "title: data\n"
    index::Int = 1
    for character in input
        character == delim ? break : index = index + 1
    end
    return input[begin:index-1]
end

function extractright(input::String, delim_location::Int, format::String)
    # returns the data section as Str or Int of a str w/ format
    # title: data\n""
    format == "String" && return input[delim_location+1:end]
    format == "Int" && return parse(Int64, input[delim_location+1:end])
    print("function not defined for ")
    println(format)
end

function extractrightUInt(input::String, delim_location::Int, format::String)
    # same as the above but UInt specific
    format == "8" && return parse(UInt8, input[delim_location+1:end])
    format == "16" && return parse(UInt16, input[delim_location+1:end])
    print("function not defined for ")
    println(format)
end

function str_int_to_hex_str(input::String)
    # takes strings like " 23, 65, 24, ....., 255,"
    # returns all 2 char hex representations concatenated
    return_str = ""
    scan_start::Int = 1
    index = 1
    for character in input
        if character == ','
            int_version = parse(UInt16, input[scan_start:index-1])
            str_version = string(int_version , base = 16) # can yield single char hex
            length(str_version) == 1 && (str_version = "0"*str_version)
            return_str = return_str * str_version
        elseif character == ' '
            scan_start = index + 1
        end
        index += 1
    end
    return return_str
end


function assign(devdata::DevDataID, case::String, data::String)
    if case == "developer_id"
        devdata.dev_id = str_int_to_hex_str(data)
    elseif case == "app_version"
        devdata.app_version = extractrightUInt(data, 12, "8")
    elseif case == "application_id"
        devdata.app_id = str_int_to_hex_str(data)
    elseif case == "manufacturer_id"
        devdata.man_id = extractrightUInt(data, 16, "16")
    elseif case == "dev_data_index"
        devdata.index = extractrightUInt(data, 15, "8")
    end
end

function assign(fielddata::FieldDef, case::String, data::String)
    nothing # in reality a bunch of if/elseif statements
end

function assign(devicedata::DeviceData, case::String, data::String)
    nothing # in reality a bunch of if/elseif statements
end

function mystructpush!(array, newstruct::DevDataID)
    push!(array, DevDataID(newstruct))
end

function mystructpush!(array, newstruct::FieldDef)
    push!(array, FieldDef(newstruct))
end

function mystructpush!(array, newstruct::DeviceData)
    push!(array, DeviceData(newstruct))
end

function txttostruct(file::String, struct_case::String, datastruct, storage_array)
    scanflag = ScanFlag()
    for line in readlines(file)
        case = extractleft(line, ':')
        if scanflag.linestogo > 0
            if scanflag.case == struct_case
                assign(datastruct, case, line)
            end
            if scanflag.linestogo == 1
                @time begin
                    mystructpush!(storage_array, datastruct)'
                end
            end
            scanflag.linestogo += -1
        end
        if case == struct_case
            scanflag.case = struct_case
            scanflag.linestogo = extractrightUInt(line, (length(struct_case)+1), "16")
        end
    end
end

# MAIN FLOW

devdata = DevDataID()
dev_array = DevDataID[]
txttostruct("developer_data.txt", "DevDataID", devdata, dev_array)

qt_codes · April 6, 2022, 3:22pm

and the much more minimal REPL example comparing pushing the struct itself vs pushing a copy (where it appears this issue is coming from)

goerch · April 6, 2022, 3:41pm

Thanks for taking the time to show us more details!

To check the size of your structures you can use Base.summarysize. To monitor what Julia is doing you could use the --trace-compile option

julia --help-hidden

    julia [switches] -- [programfile] [args...]

Switches (a '*' marks the default value, if applicable):

 --compile={yes*|no|all|min}
                          Enable or disable JIT compiler, or request exhaustive or minimal compilation

 --output-o <name>        Generate an object file (including system image data)
 --output-ji <name>       Generate a system image data file (.ji)
 --strip-metadata         Remove docstrings and source location info from system image
 --strip-ir               Remove IR (intermediate representation) of compiled functions

 --output-unopt-bc <name> Generate unoptimized LLVM bitcode (.bc)
 --output-bc <name>       Generate LLVM bitcode (.bc)
 --output-asm <name>      Generate an assembly file (.s)
 --output-incremental={yes|no*}
                          Generate an incremental output file (rather than complete)
 --trace-compile={stderr,name}
                          Print precompile statements for methods compiled during execution or save to a path
 --image-codegen          Force generate code in imaging mode

qt_codes · April 6, 2022, 4:59pm

First off, thanks for you response with those tools! the --trace-compile option is pretty hard for me to decipher, so I’m including some results below.

I also used Base.summarysize on both the struct with just Ints, and the one with UInts.
size of Int struct = size of UInt struct = 120 bytes. Sooo, I failed at making the struct smaller.

Note that for the outputs below, I stopped used the extractrightUInt function and am just using extractright in its place. The only difference between the code across these versions is the internal types of the DevDataID struct, as shown in my initial post.

I also left the timing results in to make the difference clear. I know that leaving this in adds some of the precompile lines before the result gets printed.

Int version:

julia --trace-compile=stderr decoder2.jl
precompile(Tuple{Type{UInt64}, UInt64})
precompile(Tuple{typeof(Base.:(!=)), UInt64, UInt64})
precompile(Tuple{Parameters.var"#@with_kw", LineNumberNode, Module, Any})
precompile(Tuple{typeof(Core.Compiler.convert), Type{DataType}, Type{Tuple{Any, Int64}}})
precompile(Tuple{typeof(Parameters.with_kw), Expr, Module, Bool})
precompile(Tuple{typeof(Base.:(==)), Tuple{Expr, Int64}, Int64})
precompile(Tuple{typeof(Base.iterate), Base.Iterators.Enumerate{Parameters.Lines}, Tuple{Int64, Int64}})
precompile(Tuple{typeof(Base.setindex!), OrderedCollections.OrderedDict{Any, Any}, String, Symbol})
precompile(Tuple{typeof(OrderedCollections.hashindex), Symbol, Int64})
precompile(Tuple{typeof(Base.setproperty!), OrderedCollections.OrderedDict{Any, Any}, Symbol, Int64})
precompile(Tuple{typeof(Base.setindex!), OrderedCollections.OrderedDict{Any, Any}, Int64, Symbol})
precompile(Tuple{typeof(Base.isequal), Symbol, Symbol})
precompile(Tuple{typeof(Base.prepend!), Array{Any, 1}, Array{Any, 1}})
precompile(Tuple{typeof(Base.:(!=)), Array{Any, 1}, Array{Any, 1}})
precompile(Tuple{Core.var"#@__doc__", LineNumberNode, Module, Any})
precompile(Tuple{Type{Main.DevDataID}})
precompile(Tuple{typeof(Base.getindex), Type{Main.DevDataID}})
precompile(Tuple{typeof(Main.txttostruct), String, String, Main.DevDataID, Array{Main.DevDataID, 1}})
precompile(Tuple{Type{Main.DevDataID}, String, String, Int64, Int64, Int64})
precompile(Tuple{typeof(Base.push!), Array{Main.DevDataID, 1}, Main.DevDataID})
precompile(Tuple{typeof(Base.prettyprint_getunits), Int64, Int64, Int64})
precompile(Tuple{Type{Float64}, Float64})
precompile(Tuple{typeof(Base.Ryu.writefixed), Float64, Int64})
  0.012884 seconds (2.32 k allocations: 129.162 KiB, 99.67% compilation time)
precompile(Tuple{Type{Int64}, Float64})
precompile(Tuple{typeof(Base.:(==)), Float64, Int64})
precompile(Tuple{typeof(Base.print), Base.GenericIOBuffer{Array{UInt8, 1}}, Int64, String, Vararg{String}})
  0.000010 seconds (9 allocations: 720 bytes)

UInt version:

julia --trace-compile=stderr decoder2.jl
precompile(Tuple{Type{UInt64}, UInt64})
precompile(Tuple{typeof(Base.:(!=)), UInt64, UInt64})
precompile(Tuple{Parameters.var"#@with_kw", LineNumberNode, Module, Any})
precompile(Tuple{typeof(Core.Compiler.convert), Type{DataType}, Type{Tuple{Any, Int64}}})
precompile(Tuple{typeof(Parameters.with_kw), Expr, Module, Bool})
precompile(Tuple{typeof(Base.:(==)), Tuple{Expr, Int64}, Int64})
precompile(Tuple{typeof(Base.iterate), Base.Iterators.Enumerate{Parameters.Lines}, Tuple{Int64, Int64}})
precompile(Tuple{typeof(Base.setindex!), OrderedCollections.OrderedDict{Any, Any}, String, Symbol})
precompile(Tuple{typeof(OrderedCollections.hashindex), Symbol, Int64})
precompile(Tuple{typeof(Base.setproperty!), OrderedCollections.OrderedDict{Any, Any}, Symbol, Int64})
precompile(Tuple{typeof(Base.setindex!), OrderedCollections.OrderedDict{Any, Any}, Int64, Symbol})
precompile(Tuple{typeof(Base.isequal), Symbol, Symbol})
precompile(Tuple{typeof(Base.prepend!), Array{Any, 1}, Array{Any, 1}})
precompile(Tuple{typeof(Base.:(!=)), Array{Any, 1}, Array{Any, 1}})
precompile(Tuple{Core.var"#@__doc__", LineNumberNode, Module, Any})
precompile(Tuple{Type{Main.DevDataID}})
precompile(Tuple{typeof(Base.getindex), Type{Main.DevDataID}})
precompile(Tuple{typeof(Main.txttostruct), String, String, Main.DevDataID, Array{Main.DevDataID, 1}})
precompile(Tuple{Type{Main.DevDataID}, String, String, UInt8, UInt64, UInt16})
precompile(Tuple{typeof(Base.push!), Array{Main.DevDataID, 1}, Main.DevDataID})
precompile(Tuple{typeof(Base.prettyprint_getunits), Int64, Int64, Int64})
precompile(Tuple{Type{Float64}, Float64})
precompile(Tuple{typeof(Base.Ryu.writefixed), Float64, Int64})
  0.011373 seconds (4.11 k allocations: 234.971 KiB, 99.46% compilation time)
precompile(Tuple{Type{Int64}, Float64})
precompile(Tuple{typeof(Base.:(==)), Float64, Int64})
precompile(Tuple{typeof(Base.print), Base.GenericIOBuffer{Array{UInt8, 1}}, Int64, String, Vararg{String}})
  0.000013 seconds (8 allocations: 704 bytes)

My original question (mostly) remains, why does the UInt version take so many more allocations up front, even if it is the same size according to Base.summarysize?

My secondary question now is why does copying a struct take ~6x the size of the struct in memory?

goerch · April 6, 2022, 5:24pm

First things first, regarding Base.summarysize:

using Parameters

@with_kw mutable struct DevDataID1
    dev_id::String = "no!"
    app_id::String = "no!"
    app_version::Int = 0
    index::Int = 0
    man_id::Int = 0
end

@with_kw mutable struct DevDataID2
    dev_id::String = "no!"
    app_id::String = "no!"
    app_version::UInt8 = 0
    index::UInt8 = 0
    man_id::UInt16 = 0
end

@show Base.summarysize(DevDataID1())
@show Base.summarysize(DevDataID2())

yields

Base.summarysize(DevDataID1()) = 51
Base.summarysize(DevDataID2()) = 35

for me. Secondly, your traces only differ in

precompile(Tuple{Type{Main.DevDataID}, String, String, Int64, Int64, Int64})

vs.

precompile(Tuple{Type{Main.DevDataID}, String, String, UInt8, UInt64, UInt16})

so the name number of instances is compiled in both cases (I suspected otherwise). My next suspicion would be a difference in type inference for these cases, which one could try to verify with a profiler. But be aware you are then on the way to analyze and optimize the compiler (BTW: did you try different compiler versions?).

In my experience this is due to mutable structures being heap allocated. Working with immutable structures is normally cheaper. But this could change in the future with new compiler optimizations.

goerch · April 6, 2022, 5:44pm

OK, I checked your code with @btime from BenchmarkTools

@btime txttostruct("data.txt", "DevDataID", devdata, dev_array) setup=(
    devdata = DevDataID();
    dev_array = DevDataID[]
)

yielding

37.400 μs (170 allocations: 6.37 KiB)

This looks pretty good. A run with a profiler shows 80% of the runtime is spent at

    for line in readlines(file)

and a small different ‘hotspot’ seems to be

function str_int_to_hex_str(input::String)

So I’m not sure I understand your performance problem?

Topic		Replies	Views
Unexpected performance outcome. Does accessing struct members cause allocation? General Usage performance	18	284	October 16, 2024
Tracking memory allocation overhead in @time on mutable struct code General Usage memory-allocation , mutable-structure	0	215	August 5, 2023
Why mutable structs are allocated on the heap? General Usage question	36	7943	September 26, 2024
Understanding source of allocations when profiling New to Julia memory-allocation , profiling	4	1378	August 4, 2021
GC occurs at the worst time in tight loop (Garbage Collection) Performance question	93	3333	November 7, 2023

Understanding @time memory allocations

Related topics