Lazy readdir in Julia with libuv

readdir relies on the libuv scandir function, meaning all the contents of a directory must be read to memory before we do anything. Sometimes it is useful to iterate over contents in a streaming way. Libuv allows this trough the uv_fs_readdir method. I’m trying to implement a lazy readdir using that, using ccall.

The following C code seems to do the job:

#include "uv.h"
#include "stdio.h"

uv_loop_t* loop;

uv_fs_t readdir_req;

const char* pth = "/home/user/Downloads";

int main() {

  uv_fs_opendir(NULL, &readdir_req, pth, NULL);

  uv_dirent_t dirent;
  uv_dir_t* rdir = readdir_req.ptr;
  rdir->dirents = &dirent;
  rdir->nentries = 1;

  while(uv_fs_readdir(NULL, &readdir_req, readdir_req.ptr, NULL))
    printf("%s\n", dirent.name);

  uv_fs_closedir(NULL, &readdir_req, readdir_req.ptr, NULL);

  return 0;
}

I basically need to find out how to reproduce it in Julia, if possible. What seems to be my problem right now is the assignment of the dirents pointer, where we need to allocate an array of uv_dirent_t.

The following Julia code is a draft, and doesn’t work yet. Where is the glaring mistake?

mutable struct uv_dirent_t
    name::Ptr{UInt8}
    typ::Cint
end

mutable struct uv_dir_t
    dirents::Ptr{uv_dirent_t}
    nentries::Cint
end

function lazyreaddir(dir::AbstractString)
    # Allocate space for uv_fs_t struct
    uv_readdir_req = zeros(UInt8, ccall(:jl_sizeof_uv_fs_t, Int32, ()))
    err = ccall(:uv_fs_opendir, Int32, (Ptr{Cvoid}, Ptr{UInt8}, Cstring, Ptr{Cvoid}),
                C_NULL, uv_readdir_req, dir, C_NULL)

    err < 0 && throw(SystemError("unable to read directory $dir", -err))

    thedirptr = ccall(:uv_fs_get_ptr, Ptr{uv_dir_t}, (Ptr{Cvoid},), uv_readdir_req)

    println("thedirptr ", thedirptr)

    thedir = unsafe_load(thedirptr)
    aa = Vector{uv_dirent_t}(undef, 10)
    # thedir.dirents = Libc.malloc(10240)
    thedir.dirents = pointer(aa)
    thedir.nentries = 1

    for a in 1:1
        read = ccall(:uv_fs_readdir, Cint,
                     (Ptr{Cvoid}, Ptr{Cvoid}, Ptr{uv_dir_t}, Ptr{Cvoid}),
                     C_NULL, uv_readdir_req, thedirptr, C_NULL)

        println("read $read")
        println(aa[1].name)
    end

    println("---over")
    ccall(:uv_fs_closedir, Int32, (Ptr{Cvoid}, Ptr{UInt8}, Cstring, Ptr{Cvoid}),
                C_NULL, uv_readdir_req, thedirptr, C_NULL)
end

lazyreaddir("/home/user/Downloads")
1 Like

Related: https://github.com/JuliaLang/julia/pull/27450. The decision there basically came down to: when are you reading a directory with some many items that doing this lazily matters? Do you have such a situation?

2 Likes

Thanks for pointing that out! I ended up making a PR before I saw this: https://github.com/JuliaLang/julia/pull/33478

Of course having 10 megafiles in a directory is probably not a good thing. It can happen as an accident to people sometimes, though, with me more than once. And when it happens things can get weird: you can’t ls anymore because it wants to sort everything, you need ls -U, etc… I think offering this makes Julia a more useful and trustworthy tool for dealing with the file system. A better justification is simply that the standard readdir implies a memory cost and a sorting overhead that this lazy approach avoids, so it should make operations such as rm -r generally leaner. I don’t have any numbers to show how much of a difference it can make, though, and I guess a good benchmark will probably always involve unreasonable things such as directories with many millions of files.