Read single quoted values inside strings

uwbanjoman · May 15, 2023, 12:29pm

Hi all, I have opened a text file using the readlines() function. This text file has various sections containing different information. In some of the lines i am finding strings that contain information inside single quotes. There are multiple words within these quotes. how can i keep ‘THIS TEXT’ intact when splitting following string "#1 1 0 'THIS TEXT' '' '88218681' 0.398 1 '' 1 1 2 0 168384 461330 0 " at every “” ? splitting up at every space breaks the single quoted text in two.

Henrique_Becker · May 15, 2023, 12:36pm

I believe that if you were using double quotes instead (as it is usual) maybe you could use DelimitedFile.readdlm.

uwbanjoman · May 15, 2023, 12:40pm

Thank you Henrique, but that is exactly my problem. The textfile is an export from some application which i can not change.

Henrique_Becker · May 15, 2023, 1:07pm

Cannot you read the whole file to a String, replace all single quotes by double quotes, and then write it to a string buffer to pass it to DelimitedFiles.readdlm?

uwbanjoman · May 15, 2023, 1:13pm

That’s worth a try, Thanks

frylock · May 15, 2023, 2:43pm

Ugly … but seems to work on your test line:

function tokenize(s)
    outbuffer = []

    buffer = []
    quoted = false

    for c in s
        if quoted
            if c == '\''
                push!(outbuffer, join(buffer))
                quoted = false
                buffer = []
            else
                push!(buffer, c)
            end
        elseif isspace(c)
            !iszero(length(buffer)) && push!(outbuffer, join(buffer))
            buffer = []
        else
            if c == '\''
                quoted = true
                continue
            end
            push!(buffer, c)
        end
    end

    outbuffer
end

test = "#1 1 0 'THIS TEXT' '' '88218681' 0.398 1 '' 1 1 2 0 168384 461330 0 "
tokenize(test) = Any["#1", "1", "0", "THIS TEXT", "", "88218681", "0.398", "1", "", "1", "1", "2", "0", "168384", "461330", "0"]

uwbanjoman · May 15, 2023, 3:25pm

wonderfull, this works!, I’ve tested it on another string as well. Thanks so much.

Henrique_Becker · May 15, 2023, 5:50pm

Just to point it out, the code above, nor DelimitedFiles.readdlm, support escaping quotes inside a quoted field. I would suggest checking if this is a possibility.

rocco_sprmnt21 · May 15, 2023, 6:32pm

line1="#1 1 0 'THIS TEXT' '' '88218681' 0.398 1 '' 1 1 2 0 168384 461330 0 "

split(replace(line1, r"(\pL) (\pL)"=>s"\1_\2"))


16-element Vector{SubString{String}}:
 "#1"
 "1"
 "0"
 "'THIS_TEXT'"
 "''"
 "'88218681'"
 "0.398"
 "1"
 "''"
 "1"
 "1"
 "2"
 "0"
 "168384"
 "461330"
 "0"

split(replace(line1, r"(\pL) (\pL)"=>s"\1_\2","'"=>""))
14-element Vector{SubString{String}}:
 "#1"
 "1"
 "0"
 "THIS_TEXT"
 "88218681"
 "0.398"
 "1"
 "1"
 "1"
 "2"
 "0"
 "168384"
 "461330"
 "0"

split(replace(line1, r"(\pL) (\pL)"=>s"\1\u00a0\2","'"=>"")," ")
17-element Vector{SubString{String}}:
 "#1"
 "1"
 "0"
 "THIS TEXT"
 ""
 "88218681"
 "0.398"
 "1"
 ""
 "1"
 "1"
 "2"
 "0"
 "168384"
 "461330"
 "0"
 ""
# a more general sring

line4="#1 1 0 'THIS2 3TEXT' '' '88g 218x y681' 0.398 1 '' 1 1 2 0 168384 461330 0 "
split(replace(line4, r"(\w*\pL\w*) +"=>s"\1\u00a0","'"=>"")," ")
17-element Vector{SubString{String}}:
 "#1"
 "1"
 "0"
 "THIS2 3TEXT"
 ""
 "88g 218x y681"
 "0.398"
 "1"
 ""
 "1"
 "1"
 "2"
 "0"
 "168384"
 "461330"
 "0"
 ""

PS
Can anyone explain why split(str, " ") and split(str) produce different results if there are multiple consecutive spaces (\x20)?

neophytedave · May 15, 2023, 8:32pm

I (think) I am working with a comparable situation. In my case, I copy Output from another application to the system clipboard. This causes the clipboard contents as seen by julia as a string. I then (painfully) interrogate each character of the clipboard to determine whether the character is part of the string you wish, e.g., “isequal(clipboard()[i],‘char’)”. too simple?

aplavin · May 16, 2023, 8:42am

Regexes are an obvious first-choice solution for such problems. You specify what to find in the string, and voila:

julia> map(m -> strip(m.match, '\''), eachmatch(r"'[^']*'|\S+", str))
16-element Vector{SubString{String}}:
 "#1"
 "1"
 "0"
 "THIS TEXT"
 ""
 "88218681"
 "0.398"
 "1"
 ""
 "1"
 "1"
 "2"
 "0"
 "168384"
 "461330"
 "0"

Dan · May 16, 2023, 9:48am

Another option (nesting of ' will wreak havoc).

julia> line1 = "#1 1 0 'THIS TEXT' '' '88218681' 0.398 1 '' 1 1 2 0 168384 461330 0 "

julia> Iterators.flatmap((i,x)->isodd(i) ? split(x) : [x],
  Iterators.countfrom(1), split(line1, "'")) |> collect
16-element Vector{SubString{String}}:
 "#1"
 "1"
 "0"
 "THIS TEXT"
 ""
 "88218681"
 "0.398"
 "1"
 ""
 "1"
 "1"
 "2"
 "0"
 "168384"
 "461330"
 "0"

uwbanjoman · May 16, 2023, 12:13pm

Thank you all for your interest in my problem, learning a lot here.

rocco_sprmnt21 · May 16, 2023, 1:43pm

do you means this?
but perhaps it is not a situation that can occur


line8="#1 1 'nest'THIS1  TEXT1'tsen'  '' '88g 218x   y681' '0.398 1 '' 1 1 2 0 168384 461330 0 "

Iterators.flatmap((i,x)->isodd(i) ? split(x) : [x],
  Iterators.countfrom(1), split(line8, "'")) |> collect

10-element Vector{SubString{String}}:
 "#1"
 "1"
 "nest"
 "THIS1"
 "TEXT1"
 "tsen"
 ""
 "88g 218x   y681"
 "0.398 1 "
 " 1 1 2 0 168384 461330 0 "

map(m -> strip(m.match, '\''), eachmatch(r"'[^']*'|\S+", line8))
16-element Vector{SubString{String}}:
 "#1"
 "1"
 "nest"
 "THIS1"
 "TEXT1'tsen"
 ""
 "88g 218x   y681"
 "0.398 1 "
 ""
 "1"
 "1"
 "2"
 "0"
 "168384"
 "461330"
 "0"

tokenize(line8)
8-element Vector{Any}:
 "#1"
 "1"
 "nest"
 "THIS1"
 "TEXT1tsen"
 ""
 "88g 218x   y681"
 "0.398 1 "



split(replace(line8, r"(\w*\pL\w*) +"=>s"\1\u00a0","'"=>"")," ")
17-element Vector{SubString{String}}:
 "#1"
 "1"
 "nestTHIS1 TEXT1tsen"
 ""
 ""
 "88g 218x y681"
 "0.398"
 "1"
 ""
 "1"
 "1"
 "2"
 "0"
 "168384"
 "461330"
 "0"
 ""

Topic		Replies	Views
How do I get rid of the string fragment \\" General Usage	12	1077	October 5, 2021
Quoting a quoted text New to Julia strings	4	437	December 25, 2021
Remove single quote from command line General Usage question	1	525	October 26, 2021
Parsing a string with quotations General Usage parsing	5	398	March 25, 2024
Handle escaped quotes in CSVFiles/TextParse.jl Data	1	454	March 13, 2019

Read single quoted values inside strings

Related topics