Parsing WhatsApp text chat

statspy · May 2, 2021, 6:19pm

Hello,
I am trying to parse a whatsapp text chat as a DataFrame . What I need is to parse this format:

06/18/2021 20:48 - +36 41 9989-8989: Hello, how are you? <= all in one column
or
06/18/2021 20:48 - Paul Faben: Hello, how are you? <= all in one column

to get this table:

Date | Time | Number | Text <= 4 columns

Any help would be great, as I am new to Julia…
Thanks

simeonschaub · May 2, 2021, 6:51pm

I am sure it’s possible to implement a more efficient version of this by writing a custom parser, but to implement a simple version, you can just use a regex together with the Dates stdlib:

using DataFrames, Dates

function read_chats(str)
    df = DataFrame(date=Date[], time=Time[], number=String[], text=String[])
    for row in eachmatch(r"^(?<date>.+?) (?<time>.+?) \- (?<number>.*?)\: (?<text>.*)$"m, str)
        date = parse(Date, row["date"], dateformat"mm/dd/yyyy")
        time = parse(Time, row["time"])
        push!(df, (; date, time, number=row["number"], text=row["text"]))
    end
    return df
end

r"^(?<date>.+?) (?<time>.+?) \- (?<number>.*?)\: (?<text>.*)$"m is a regex describing the format your lines are in, where each part you are interested in is a named group. You can use parse to then parse the dates and times to Date and Time objects.

Reading in two rows:

julia> read_chats("""
       06/18/2021 20:48 - +36 41 9989-8989: Hello, how are you?
       06/18/2021 20:48 - Paul Faben: Hello, how are you?""")
2×4 DataFrame
 Row │ date        time      number            text                
     │ Date        Time      String            String              
─────┼─────────────────────────────────────────────────────────────
   1 │ 2021-06-18  20:48:00  +36 41 9989-8989  Hello, how are you?
   2 │ 2021-06-18  20:48:00  Paul Faben        Hello, how are you?

To read from a file, you can just read it into a string using read(filename, String).

statspy · May 2, 2021, 6:55pm

Awesome! Thank you very much @simeonschaub

Topic		Replies	Views
Parsing date column when reading in CSV General Usage dates , dataframes , csv	6	1076	March 7, 2023
Reading Date.jl New to Julia	2	1529	February 18, 2019
Converting DateTime in DataFrames indo Data New to Julia question , dates , dataframes	9	561	April 15, 2021
Split .txt file into several DataFrames General Usage dataframes , csv	5	368	March 27, 2023
Cast/parse multiple variables at once General Usage parsing , datetime	5	198	May 17, 2024

Parsing WhatsApp text chat

Related topics