I am sure it’s possible to implement a more efficient version of this by writing a custom parser, but to implement a simple version, you can just use a regex together with the Dates stdlib:
using DataFrames, Dates
function read_chats(str)
df = DataFrame(date=Date[], time=Time[], number=String[], text=String[])
for row in eachmatch(r"^(?<date>.+?) (?<time>.+?) \- (?<number>.*?)\: (?<text>.*)$"m, str)
date = parse(Date, row["date"], dateformat"mm/dd/yyyy")
time = parse(Time, row["time"])
push!(df, (; date, time, number=row["number"], text=row["text"]))
end
return df
end
r"^(?<date>.+?) (?<time>.+?) \- (?<number>.*?)\: (?<text>.*)$"m is a regex describing the format your lines are in, where each part you are interested in is a named group. You can use parse to then parse the dates and times to Date and Time objects.
Reading in two rows:
julia> read_chats("""
06/18/2021 20:48 - +36 41 9989-8989: Hello, how are you?
06/18/2021 20:48 - Paul Faben: Hello, how are you?""")
2×4 DataFrame
Row │ date time number text
│ Date Time String String
─────┼─────────────────────────────────────────────────────────────
1 │ 2021-06-18 20:48:00 +36 41 9989-8989 Hello, how are you?
2 │ 2021-06-18 20:48:00 Paul Faben Hello, how are you?
To read from a file, you can just read it into a string using read(filename, String).