How to compile an Array of Numbers and Strings with StaticCompiler?

I want to represent Ints, Floats, and Strings in an Array or a Matrix, and compile it with help of StaticCompiler.
How can I do this?
Since a MallocMatrix on Ints works (but not on an union), one could encode Floats and (Malloc)Strings as Integer. More specific, Floats could be represented as a fraction of an Integer and a power of ten, e.g. 1000. Strings could be converted to a number via the ascii-value of the string’s characters. Here is one approach.

using StaticCompiler, StaticTools

const int_type = 1
const float_type = 2
const string_type = 3

mutable struct DataFrameStc
  table::MallocMatrix{Int64}
  number_of_rows::Int64
  number_of_columns::Int64

  function DataFrameStc(number_of_rows::Int64,number_of_columns::Int64)
    table = MallocMatrix{Int64}(undef,number_of_rows,number_of_columns)
    new(table,number_of_rows,number_of_columns)
  end
end

encode(num::Int64)::Int64 = num
encode(num::Float64)::Int64 = Int(num * 1000)
encode(str::MallocString)::Int64 = str2num(str)

function decode(num::Int64,column_type::Int64)
  if column_type == int_type
    return num
  elseif column_type == float_type
    return num * 0.001
  else
    return num2str(num)
  end
end

function num2str(num::Int64)::MallocString
  num2letter = MallocArray{MallocString}(undef,26)
  num2letter[1] = m"A"
  num2letter[2] = m"B"
  num2letter[3] = m"C"
  # and so on
  str = m""
  while num > 0 
    t = mod(num-1,26)
    b = num2letter[Int(t+1)]
    str = b*str
    num = div((num-t),26)
  end
  str
end

function str2num(str::MallocString)::Int64
  len = length(str)
  num = 0
  for i in 1:len
    num += (Int(str[i])-64) * 26 ^ (len-i)
  end
  num
end

function f()
  column_types = MallocArray{Int64}(undef,3)
  column_types[1] = int_type
  column_types[2] = float_type
  column_types[3] = string_type
  df = DataFrameStc(2,3)
  df.table[1,1] = encode(1)
  df.table[1,2] = encode(2.345)
  df.table[1,3] = encode(m"A")  
  val1 = decode(df.table[1,1],column_types[1])
  val2 = decode(df.table[1,2],column_types[2])
  val3 = decode(df.table[1,3],column_types[3])
  printf(val1)
  printf(val2)
  printf(val3)
  0
end

@time f()

compile_executable(f, (), "C:\\jul\\staticcompiler")

The code fails with a link error.

ld.lld: error: undefined symbol: ijl_throw
>>> referenced by C:/Users/T460/AppData/Local/Temp/f-c1cca1.o:(f)
clang-17: error: linker command failed with exit code 1 (use -v to see invocation)
ERROR: LoadError: failed process: Process(`cmd /c clang -Wno-override-module 'C:\jul\staticcompiler\wrapper.c' 'C:\jul\staticcompiler\f.ll' -o 'C:\jul\staticcompiler\f'`, ProcessExited(1)) [1]

I want to mention that I run StaticCompiler on Windows, and I have adapted StaticCompiler to Windows. I don’t know, if this error is windows-specific. So if you use Linux, it would be valuable to know, whether you receive the same error.

The code works fine, when calling decode() with a concrete column_type, e.g. val1 = decode(df.table[1,1],int_type). However, I don’t know the column type in advance, but want to determine it with column_types[<column_number>]. Furthermore the code works, when representing and decoding only two of the three types, e.g. Int and Float, or Int and MallocString.

How can I run it with all three types, Int, Float and String, and determining the column type from the column number?

You may be interested in the following packages.

For example with InlineStrings.jl you can turn a short string into an integer or vice-versa.

julia> using InlineStrings

julia> reinterpret(UInt64, inline7"Hello")
0x48656c6c6f000005

julia> reinterpret(String7, 0x48656c6c6f000005)
"Hello"

julia> using FixedPointNumbers

julia> ninety_deg = N0f64(π/4)
0.7853981633974484N0f64

julia> reinterpret(UInt64, ninety_deg)
0xc90fdaa22168bfff

julia> reinterpret(N0f64, 0xc90fdaa22168bfff)
0.7853981633974484N0f64

I’m not really sure about the static compiler part, but perhaps that will help put you in the right direction.

1 Like

Thanks a lot for pointing to these packages.

I tried applying reinterpret() on MallocString. It can be converted into Int128 or UInt128.

using StaticTools, StaticCompiler, InlineStrings
f() = reinterpret(UInt128,m"A")
compile_executable(f, (), "C:\\test\")

However, when trying to compile with StaticCompiler, I get an error:

Warning: Found pointer references to julia data
│   llvm instruction = store {}* inttoptr (i64 140708314336256 to {}*), {}** %.sub, align 8
│   name = Symbol("")
│   file = Symbol("C:\\Users\\T460\\.julia\\compiled\\v1.10\\StaticTools\\PXb4A_i0X9q.dll")
│   line = -1
│   fromC = true
│   inlined = false
└ @ StaticCompiler C:\Users\T460\.julia\packages\StaticCompiler\RM4Pv\src\pointer_warning.jl:59
┌ Warning: LLVM function generated warnings due to raw pointers embedded in the code. This will likely cause errors or undefined behaviour.
...

└ @ StaticCompiler C:\Users\T460\.julia\packages\StaticCompiler\RM4Pv\src\pointer_warning.jl:33
ERROR: MethodError: Cannot `convert` an object of type LLVM.ConstantExpr to an object of type Int64

Converting a Float number into an Int128 can be compiled. E.g.
f() = Int128(reinterpret(Int64,3.4))
works.

Can you reproduce, explain or overcome this error?

With help of the package InlineStrings a solution for representing Floats, Ints and Strings in a Matrix is possible: The function reinterpret() enables to encode the three types into Int64. In the case of strings, it is helpful to consider the pointers when encoding, and use the function unsafe_mallocstring() of StaticTools. The following code can be compiled with help of StaticCompiler (and works also in Windows).

using StaticCompiler, StaticTools, InlineStrings

const int_typ = 1
const float_typ = 2
const string_typ = 3

mutable struct DataFrameStc
  table::MallocMatrix{Int64}
  number_of_rows::Int64
  number_of_columns::Int64
  types_of_columns::StackArray{Int64}

  function DataFrameStc(number_of_rows::Int64,number_of_columns::Int64,types_of_columns::StackArray)
    table = MallocMatrix{Int64}(undef,number_of_rows,number_of_columns)
    new(table,number_of_rows,number_of_columns,types_of_columns)
  end
end

encode(num::Int64)::Int64 = num
encode(num::Float64)::Int64 = reinterpret(Int64,num)
encode(str::MallocString)::Int64 = reinterpret(Int64,pointer(str))

function decode(num::Int64,column_type::Int64)
  if column_type == string_typ
    return unsafe_mallocstring(reinterpret(Ptr{UInt8},num))
  else
    if column_type == int_typ
      return num
    else # column_type == float_typ
      return reinterpret(Float64,num)
    end
  end
end

function f()
  types_of_columns = StackArray{Int64}(undef,3)
  types_of_columns[1] = int_typ
  types_of_columns[2] = float_typ
  types_of_columns[3] = string_typ
  df = DataFrameStc(2,3,types_of_columns)
  df.table[1,1] = encode(1)
  df.table[1,2] = encode(2.345)
  df.table[1,3] = encode(m"A")  
  val1 = decode(df.table[1,1],types_of_columns[1])
  val2 = decode(df.table[1,2],types_of_columns[2])
  val3 = decode(df.table[1,3],types_of_columns[3])
  printf(c"val1:%d\n",val1)
  printf(c"val2:%g\n",val2)
  printf(c"val3:%s\n",val3)
  0
end

compile_executable(f, (), "C:\\jul\\staticcompiler")
1 Like

Another possibility to represent Floats, Integers and Strings in a way that can be compiled with help of StaticCompiler is to use NamedTuple:

function nt_test()
  d = MallocArray{Int64}(undef,3)
  d[1] = 1
  d[2] = 2
  d[3] = 3
  e = MallocArray{Float64}(undef,3)
  e[1] = 1.2
  e[2] = 3.4
  e[3] = 4.5
  f = MallocArray{MallocString}(undef,3)
  f[1] = m"A"
  f[2] = m"B"
  f[3] = m"C"
  nt = (a = d,b = e,c = f)
  column1 = nt.a
  column2 = nt.b
  column3 = nt.c
  element1 = nt.a[1]
  element2 = nt.b[1]
  element3 = nt.c[1]
  printf(c"\nelement1:%d",element1)
  printf(c"\nelement2:%g",element2)
  printf(c"\nelement3:%s",element3)
  0
end