I am using Julia 0.6.2 and have written the following function. (Data generation added as by request)
# Generate testdata
function fillpath(path, nbfiles)
cd(path)
mkdir("juliatest")
cd("juliatest")
for i in ["0","1"]
mkdir(i)
cd(i)
for j in 1:nbfiles
touch("$j.dat")
end
cd("..")
end
end
fillpath("/tmp", 10000)
"""
files, subdirs = subdirlabeledfiles(path)
Return a Vector files containing the filenames in all subdirectories and a Vector
subdirs containing the name of the subdirectory for each file that can be used as a label.
"""
function subdirlabeledfiles(path)
# Get all subdirs of path
subdirs = filter(x -> isdir(joinpath(path,x)), readdir(path))
# Get files in all subdirs
subdirspaths = joinpath.(path, subdirs) # Get absolute paths of subdirs
files = [filter(isfile, joinpath.(subdir, readdir(subdir))) for subdir in subdirspaths]
# Get a list naming the subdir for each file
subdirs = (fill(subdir, length(files[i])) for (i, subdir) in enumerate(subdirs)) # does not work
# subdirs = [fill(subdir, length(files[i])) for (i, subdir) in enumerate(subdirs)] # does work
# Flatten the results
files = vcat(files...)
subdirs = vcat(subdirs...)
return files, subdirs
end
X, Y = subdirlabeledfiles("/tmp/juliatest")
length(Y)
The idea: I give a path and the function returns a vector containing all files within all subdirectories and an additional vector indicating in which subdirectory the given file was. It is intended to be used to load datasets where e.g. images for different labels are in different subdirectories. Probably not the most elegant way of doing this, but anyway.
Now to the problem: When I use a generator expression in the line marked with # does not work the vector I get for subdirs is much too short. Instead of 20000 entries I get 51 entries. When I change it to generate an Array with it works correctly.
Executing the working code with gives a length of 20000 which is expected. Running it with ( ) gives a length of 51. The vector contains the correct entries (first “0”, then “1”), there are just not enough entries.
Is there something I am missing? Is this intended behaviour? Is this a bug?
Can you try to simplify it even more, by removing all filesystem operations? Finding the minimal code which reproduces the problem is always useful.
I wonder whether the fact that you reassign new values to variables which are used by generators could trigger the behavior you observe (either because that’s documented, or because of a bug).
Yes, this looks suspiciously like a bug. If I comment out the
files = vcat(files...)
it works. Looks like collecting the generator does something to its value.
Isolating a MWE (without filesystem stuff, which should be irrelevant) would be very useful. I can reproduce the bug on both v0.6.2 and current master v0.7-.
I have no idea. If you can write a very small example to reproduce the problem, it can be worth filing an issue just to make sure that’s expected (unless somebody who knows comments here).
Thanks for the MWE. I have been thinking of generators as “lazy maps”, so I expected semantics like map, somehow capturing the variables. But apparently they don’t.
I think an issue should be opened, to at least clarify this.
Ok, then this is not an MWE for the problem above, but I am still under the impression that something fishy is going on with that. Unfortunately, I have no time to dissect it now.