Nested iteration in comprehension

I was expecting the following to read a file into an N x M array of characters:

load(file) = [c for l in eachline(file), c in l]

where the file’s contents are:

MMMSXXMASM
MSAMXMSMSA
AMXSXMAAMM
MSAMASMSMX
XMASAMXAMM
XXAMMXXAMA
SMSMSASXSS
SAXAMASAAA
MAMMMXMMMM
MXMXAXMASX

However, it seems to return some numbers that I don’t understand instead:

julia> load("/tmp/example.txt")
10-element Vector{Int64}:
 10112
 10112
 10112
 10112
 10112
 10112
 10112
 10112
 10112
 10112

Anyone have any insight into why that happens?

Only solution I’ve found so far (short of an explicit nested loop) is this:

load(file) = reduce(vcat, permutedims(collect(l)) for l in eachline(file))

which is rather ugly, but does produce the desired result:

julia> load("/tmp/example.txt")
10×10 Matrix{Char}:
 'M'  'M'  'M'  'S'  'X'  'X'  'M'  'A'  'S'  'M'
 'M'  'S'  'A'  'M'  'X'  'M'  'S'  'M'  'S'  'A'
 'A'  'M'  'X'  'S'  'X'  'M'  'A'  'A'  'M'  'M'
 'M'  'S'  'A'  'M'  'A'  'S'  'M'  'S'  'M'  'X'
 'X'  'M'  'A'  'S'  'A'  'M'  'X'  'A'  'M'  'M'
 'X'  'X'  'A'  'M'  'M'  'X'  'X'  'A'  'M'  'A'
 'S'  'M'  'S'  'M'  'S'  'A'  'S'  'X'  'S'  'S'
 'S'  'A'  'X'  'A'  'M'  'A'  'S'  'A'  'A'  'A'
 'M'  'A'  'M'  'M'  'M'  'X'  'M'  'M'  'M'  'M'
 'M'  'X'  'M'  'X'  'A'  'X'  'M'  'A'  'S'  'X'

The nested iteration does not allow for depend loops. I believe the rationale behind that is that the nested iteration yields an rectangular Array which is not guaranteed if the loops’ lengths can depend on each other.

Here are some other suggestions:

eachline(file) |> stack |> permutedims
permutedims(stack(eachline(file))) # same as above, just different syntax
stack(eachline(file); dims=1)
3 Likes

Beautiful! Thanks.

Depending on what output you want, you can do:

load(file) = [c for l in eachline(file) for c in l]

and reshape the vector to a matrix.

If you want to save time and the file is formatted cleanly as in the question, using mmap will probably be fastest. Also note the use of UInt8 instead of Char, as Char is really a 32-bit Unicode point, but UInt8 is appropriate for old style ASCII only files.

julia> using Mmap

julia> f = open("/tmp/example.txt")
IOStream(<file /tmp/example.txt>)

julia> A = mmap(f,Matrix{UInt8},(11,10));

julia> M = @view A[1:10,:]
10×10 view(::Matrix{UInt8}, 1:10, :) with eltype UInt8:
 0x4d  0x4d  0x41  0x4d  0x58  0x58  0x53  0x53  0x4d  0x4d
 0x4d  0x53  0x4d  0x53  0x4d  0x58  0x4d  0x41  0x41  0x58
 0x4d  0x41  0x58  0x41  0x41  0x41  0x53  0x58  0x4d  0x4d
 0x53  0x4d  0x53  0x4d  0x53  0x4d  0x4d  0x41  0x4d  0x58
 0x58  0x58  0x58  0x41  0x41  0x4d  0x53  0x4d  0x4d  0x41
 0x58  0x4d  0x4d  0x53  0x4d  0x58  0x41  0x41  0x58  0x58
 0x4d  0x53  0x41  0x4d  0x58  0x58  0x53  0x53  0x4d  0x4d
 0x41  0x4d  0x41  0x53  0x41  0x41  0x58  0x41  0x4d  0x41
 0x53  0x53  0x4d  0x4d  0x4d  0x4d  0x53  0x41  0x4d  0x53
 0x4d  0x41  0x4d  0x58  0x4d  0x41  0x53  0x41  0x4d  0x58
2 Likes

Oh cool, it’s good to know how easy it is to mmap files when they’re in clean formats like this.

FWIW this is coming from this year’s advent of code challenge, day 4.

2 Likes

Might be if interest

here another reference to the AOC 2024 context/contest.
Refer to the ZULIP blog.
I would have liked to see the discussions about AOC problems here instead.
I tried my hand at the problem of day 4, I had a lot of difficulty even simply activating the spoiler of my proposed solutions.