Julia 1.0.3 segfaults when trying to read a csv DataFrame with non-quoted headers and strings

dataframes
#1

Hello,

My code goes:

using Queryverse
df = load("my_file.csv") |> DataFrame

And the file contents are:

query_domain,template_domain,tm_score
d1qbaa2,d2a73b2,0.52434
d1qbaa2,d2wnxa1,0.51702
d1qbaa2,d1p35a_,0.50272
d1qbaa2,d3zuca1,0.50237
d1qbaa2,d4b9pa_,0.49787
d1qbaa2,d3u9wa1,0.49737
....

The Julia process randomly segfaults when trying to execute this code. By “Randomly” I mean that I have lots of files with similar structure, and segfaults reliably occur on some of them , but not others. So far I have not been able to figure out what is special about the files that cause the segfault.

Wrapping all string tokens in the file in double quotes gets rid of the segfault.

I am running Julia v1.0.3 on Linux, Queryverse v0.2.0. I wonder, if this has bee reported before?

0 Likes

#2

I’ve had several segfaults on Julia 1.0.x, but they’ve been progressively patched. I would recommend trying Julia 1.1.0 if possible. It fixed at least one scary production segfault for us.

0 Likes

#3

Not really a solution, but I had a similar issue a few months ago. I ended up using CSV.jl (as opposed to Queryverse’s ‘CSVFiles.jl’) to import the files and then piping that to a dataframe. Definitely a workaround. I’d first do as @cstjean suggested and moving to 1.1.0 before I tried to rewrite any code.

0 Likes