Reading a UTF-16-LE file

I would like to read from a utf-16-le-encoded file. In this case, it’s a tsv. In python, I can read it as:

import pandas as pd

with open('file.tsv', encoding='utf-16-le') as f:
    df = pd.read_table(f)

In Julia, I think I should open, do readbytes! into a Vector{UInt8}, convert/reshape to Vector{UInt16}, then call LegacyStrings.utf16, just to get the string data. Is there a simpler way?

Have you tried StringEncodings?

using StringEncodings
fid = open(path,enc"UTF-16LE","r")
...
1 Like

No, and this looks perfect. Thanks!

Without using external packages, you can use the built-in transcode function:

f = IOBuffer(transcode(UInt8, ltoh.(reinterpret(UInt16, read("file.tsv"))))
...read data from `f`...

(On little-endian machines you could omit the ltoh call.) Use transcode(String, ...) to get a String instead, but an IOBuffer may be more convenient if you want to use it with other I/O routines.

1 Like