Data representation in StaticCompiler

I want to represent float numbers, integers and strings in a DataFrame or similar representation. I’d like to compile this with help of StaticCompiler to produce small executable binaries.
How can I represent these data in a data structure such that it can be static-compiled with StaticCompiler?

A first step could be to represent the float numbers, integer numbers and strings in an array. However, many things don’t work for StaticCompiler, because GC-tracked allocations are not allowed.

I tried MallocArray or MallocMatrix on an union of Int64, Float64 and MallocString, but this does not work.

Now I used a MallocArray on Int64. Strings would be coded as integer numbers. So I implemented coding functions and decoding functions. This requires converting an UInt8 to an Int. UInt8 is the type of a character of a string (e.g. m"ABC"[2] is 0x42 and of type UInt8). However, applying Int() results in an error (although Int() in a separate function works fine). Similarly, when using a MallocArray on Float64, I additionally need Int() to decode float numbers and strings which does not work.

Here is the code.


using StaticTools, StaticCompiler

mutable struct Df
  table::MallocArray{Int64}
  number_of_rows::Int64
  number_of_columns::Int64

  function Df(number_of_rows::Int64,number_of_columns::Int64)
    table = MallocArray{Int64}(undef,number_of_rows*number_of_columns)
    new(table,number_of_rows,number_of_columns)
  end

end

function value()
end

function set_value(df::Df,row_num::Int64,column_num::Int64,str::MallocString)
  idx = (row_num-1) * df.number_of_columns + column_num
  df.table[idx] = str2num(str)
end

function set_value(df::Df,row_num::Int64,column_num::Int64,val::Int64)
  idx = (row_num-1) * df.number_of_columns + column_num
  df.table[idx] = val
end

function set_value(df::Df,row_num::Int64,column_num::Int64,val::Float64)
  idx = (row_num-1) * df.number_of_columns + column_num
  df.table[idx] = Int(val)
end

function str2num(str::MallocString)::Int64
  len = length(str)
  num = 0
  for i in 1:len
    num += (Int(str[i])-64) * 26 ^ (len-i)
  end
  num 
end

function f()
  df = Df(2,3)  
  set_value(df,1,1,m"A")
  0
end

f()
compile_executable(f, (), "C:\\jul\\staticcompiler")

I want to compile some code with help of StaticCompiler. I want to use StaticCompiler to produce small executable binaries. The program shall represent float numbers, integer numbers and strings in a DataFrame. Do you know, how to do this, or what similar representation could work with StaticCompiler?

As a first step I want to represent the float numbers, integer numbers and strings in an array. However, many things don’t work for StaticCompiler, because GC-tracked allocations are not allowed.

I tried MallocArray or MallocMatrix on an union of Int64, Float64 and MallocString, but this does not work.

Now I used a MallocArray on Int64. Strings would be coded as integer numbers. So I implemented coding functions and decoding functions. This requires converting an Int to an UInt8. However, applying Int() results in an error (although Int() in a separate function works fine). Similarly, when using a MallocArray on Float64, I additionally need Int() to decode float numbers (and strings) which does not work.

This is the code.

using StaticTools, StaticCompiler

mutable struct Df
  table::MallocArray{Int64}
  number_of_rows::Int64
  number_of_columns::Int64

  function Df(number_of_rows::Int64,number_of_columns::Int64)
    table = MallocArray{Int64}(undef,number_of_rows*number_of_columns)
    new(table,number_of_rows,number_of_columns)
  end

end

function set_value(df::Df,row_num::Int64,column_num::Int64,str::MallocString)
  idx = (row_num-1) * df.number_of_columns + column_num
  df.table[idx] = str2num(str)
end

function str2num(str::MallocString)::Int64
  len = length(str)
  num = 0
  for i in 1:len
    num += (Int(str[i])-64) * 26 ^ (len-i)
  end
  num 
end

function f()
  df = Df(2,3)  
  set_value(df,1,1,m"A")
  0
end

f()
compile_executable(f, (), "C:\\jul\\staticcompiler")

It results in

ld.lld: error: undefined symbol: julia.new_gc_frame

obviously a linking error (?), and some warnings regarding the generated LLVM IR:

Warning: Found pointer references to julia data
¦   llvm instruction = store {}* inttoptr (i64 3221089733040 to {}*), {}** %28, align 8
¦   name = Symbol("")
¦   file = Symbol("")
¦   line = -1
¦   fromC = true
¦   inlined = false
+ @ StaticCompiler C:\Users\T460\.julia\packages\StaticCompiler\RM4Pv\src\pointer_warning.jl:59
+ Warning: LLVM function generated warnings due to raw pointers embedded in the code. This will likely cause errors or undefined behaviour.
¦   func =
¦    define internal fastcc void @julia_set_value_915({}* noundef nonnull align 8 dereferenceable(24) %0, i64 signext %1, i64 signext %2, [2 x i64]* nocapture noundef nonnull readonly align 8 dereferenceable(16) %3) unnamed_addr {
¦    top:
¦      %4 = alloca [2 x {}*], align 8
¦      %gcframe = call {}** @julia.new_gc_frame(i32 1)
¦      call void @julia.push_gc_frame({}** nonnull %gcframe, i32 1)
¦      %5 = bitcast {}* %0 to i8*
¦      %6 = getelementptr inbounds i8, i8* %5, i64 16
¦      %7 = bitcast i8* %6 to i64*
¦      %8 = load i64, i64* %7, align 8, !tbaa !2, !alias.scope !8, !noalias !11
¦      %9 = getelementptr inbounds [2 x i64], [2 x i64]* %3, i64 0, i64 1
¦      %unbox = load i64, i64* %9, align 8, !tbaa !16, !alias.scope !18, !noalias !19
¦      %10 = add i64 %unbox, -1
¦      %11 = call i64 @llvm.smax.i64(i64 %10, i64 0)
¦      %12 = icmp slt i64 %10, 1
¦      br i1 %12, label %L52, label %L23.preheader
¦    
¦    L23.preheader:                                    ; preds = %top
¦      %13 = getelementptr inbounds [2 x i64], [2 x i64]* %3, i64 0, i64 0
¦      %bitcast = load i64, i64* %13, align 8, !tbaa !16, !alias.scope !18, !noalias !19
¦      %14 = inttoptr i64 %bitcast to i8*
¦      br label %L23

etc.

I use StaticCompiler on Windows. I adapted StaticCompiler to Windows, which seems to work. But I cannot be sure whether some of the errors (see below) are windows-specific.

What do these errors mean? What goes on there? How could I get it work?

When I run your function in the REPL, it shows memory allocations:

julia> @time f()
  0.000005 seconds (3 allocations: 80 bytes)

So it’s not possible to compile it into a standalone executable with StaticCompiler.

(Getting rid of allocations is a necessary but not sufficient condition for using StaticCompiler.compile_executable.)

P.S. A useful tool is AllocCheck.jl.

Instead of a workaround, since GC not supported, how about rather using:
https://tshort.github.io/WebAssemblyCompiler.jl/stable/

WebAssemblyCompiler supports many Julia constructs, including:

  • […]
  • Dicts (not including strings)
  • Mutable and immutable structs
  • […]

Heap allocation is handled by WebAssembly’s garbage collector (see wasm-GC).

Interoperability with JavaScript is quite good. Julia code can run JavaScript functions and exchange objects. This functionality allows Julia to interact with the browsers DOM.

Code must be type stable (no dynamic dispatches). In addition, several Julia constructs are not supported, including:

  • Multi-dimensional arrays (waiting on the Memory type PR)
  • Pointers
  • […]
  • Some integer types (Int16, Int128, …)
  • BLAS and all other C dependencies

I realize you may not want to support the web, but WebAssembly isn’t restricted to it. Maybe the package is though still?

The Memory type PR is merged, so it lifts some restrictions, e.g. related to Dicts. StaticCompiler has many limitations, such as GC missing, and historically not working on Windows (I believe that’s fully fixed now, by you, thanks!). They share a lot of code (in the GPUCompiler dependency, and GPU there misleading in this context). I don’t think WebAssemblyCompiler has any limitations not shared by StaticCompiler, but I might be wrong, maybe “pointers” work there (and I don’t see the reason for integer types, in either…)? When you run from the web you have maybe max 4GB available, but I don’t think that’s an inherent limitation.

@greatpet Thank you for the hint to check the number of GC allocation with help of @time. This gives us a simple fast tool, whether static-compiling is possible.
The code above has abviously GC allocations, when using the function str2num(), more specific, using the power operator ^ and the for-loop, in combination with calculating the index ( idx = (row_num-1) * df.number_of_columns + column_num) in the function set_value(), especially the multiplication.
So what could help is to set the index in the main function f():

function set_value(df::Df,idx::Int64,str::MallocString)
  df.table[idx] = str2num(str)
end

idx(df::Df,row_num::Int64,column_num::Int64) = row_num * df.number_of_columns + column_num

function f()
  df = Df(2,3)  
  set_value(df,idx(df,2,3),m"A")
end

@Palli Thank you for mentioning WebAssemblyCompiler. It would be an option to make a lot of things easier. First I want to try to stay independent of the web