Could somebody please clarify what “distributed datasets” mean in this context? Coming from Hadoop and distributed databases I understand it as a set of files stored on multiple machines with the ability to run particular code or a query locally without copying data over a network. However, in a description of both - JuliaDB.jl and Dagger.jl - I can see only examples of loading data from a local disk and maybe copying it to other machines for processing.
In other words, is JuliaDB.jl more similar to DataTables or to Hadoop?