How to read hex columns in CSV files

I have the following code:

using DLMReader
    ds = filereader(logfile_name, types = [Float64, Int, Int, Int, Int, Int, Int, Int, Int, Int, Int, Int, Int], 
                                header=[:time, :RTR, :addr, :Flags, :DLC, :d1, :d2, :d3, :d4, :d5, :d6, :d7, :d8], warn=0)

which used to work, but now I got a .csv file where the third column is hex encoded and now it fails.

How can I read hex encoded columns of a .csv file?

A solution with CSV.jl would also be fine with me, I could write a script fix_csv.jl that just converts the hex column into decimal integer values.

Example of a column of the .csv file I would be able to read:

6976.81202,1,5c1,2,8,67,91,35,0,0,0,0,0,2

The value “5c1” should be parsed as hex value into an integer column.

It looks as if “informats” could be used to fix this issue?
https://sl-solution.github.io/DLMReader.jl/stable/man/tutorial_basic/#Reading-a-csv-file

I would say so. Register an informat which puts “0x” in front of the values of column3.
See also: Informats · DLMReader

Not working. The following code:

using DLMReader

function hex2int!(str)
    val = parse(Int64, str, base=16)
    replace!(str, str=>repr(val)) 
    return str
end
register_informat(hex2int!)

gives the error:

ERROR: LoadError: AssertionError: informat must return its input or a subset of its input

A decimal value is often longer than its hex representation, but it seams as if this function is not allowed to return a string that is longer than the original string.

When you say a CSV.jl solution is fine do you mean a solution that directly parses the hex values or simply one that doesn’t fail and give you strings you can convert? That seems to work out of the box:

julia> using CSV

julia> CSV.File(IOBuffer("6976.81,1,5c1,2,8"), header = false)
1-element CSV.File:
 CSV.Row: (Column1 = 6976.81, Column2 = 1, Column3 = String3("5c1"), Column4 = 2, Column5 = 8)

(I’ve shortened your example a bit to make it more legible)

Created a bug report: Informats cannot parse hex values · Issue #14 · sl-solution/DLMReader.jl · GitHub

Parsing hex as string is easy, but how do I convert:

julia> ds.addr
1318767-element Vector{Union{Missing, String}}:
 "682"
 "5c1"
...

into a Vector of integers?
I can parse scalar values using parse(Int64, "0a", base=16), but how do I apply such a function on a Vector?

This was my idea. Not parsing the value inside the informat. Like: ``` function hex(str) return "0x"*str end register_informat(hex) ``` Did you read https://sl-solution.github.io/DLMReader.jl/stable/man/informat/ ?

Users can define their own informats, which is basically a function with one positional argument. The function must accept a special mutable string and returns its modified value (or returns a subset of it).

Nice idea, but it doesn’t work:

julia> include("scripts/read_hex.jl")
ERROR: LoadError: AssertionError: informat must return its input or a subset of its input
Stacktrace:
 [1] register_informat(f::Function; quiet::Bool, force::Bool)
   @ DLMReader ~/.julia/packages/DLMReader/gB2Mj/src/informats.jl:24
 [2] register_informat(f::Function)
   @ DLMReader ~/.julia/packages/DLMReader/gB2Mj/src/informats.jl:23
 [3] top-level scope
   @ ~/repos/LogFiles/scripts/read_hex.jl:13
 [4] include(fname::String)
   @ Base.MainInclude ./client.jl:476
 [5] top-level scope
   @ REPL[3]:1
in expression starting at /home/ufechner/repos/LogFiles/scripts/read_hex.jl:13

One way:

julia> x
3-element Vector{Union{Missing, String}}:
 "682"
 "5c1"
 missing

julia> passmissing(y -> parse(Int64, y, base = 16)).(x)
3-element Vector{Union{Missing, Int64}}:
 1666
 1473
     missing
1 Like

In which package is “passmissing” defined?

It’s in Missings, but is also rexported by DataFrames so if you read from a CSV into a DataFrame it should be available anyway.

It’s a simple function though which you could replace by this if you don’t want an extra dependency:

julia> (y -> ismissing(y) ? y : parse(Int64, y, base = 16)).(x)
3-element Vector{Union{Missing, Int64}}:
 1666
 1473
     missing

Thanks!

Now using this code:

function hex2int(str)
    if ismissing(str) return missing end
    return parse(Int64, str, base=16)
end

ds = filereader(logfile_name, types = [Float64, Int, String, Int, Int, Int, Int, Int, Int, Int, Int, Int, Int], 
    header=[:time, :RTR, :addr_hex, :Flags, :DLC, :d1, :d2, :d3, :d4, :d5, :d6, :d7, :d8], warn=0)

modify!(ds, :addr_hex => byrow(hex2int) => :addr)
select!(ds, Not(:addr_hex))
select!(ds, [1,2,13,4,5,6,7,8,9,10,11,12])

You are right. My idea doesn’t work.
Adding a “0x” in front seems to be not possible as it would make the string longer, which somehow isn’t allowed.