What fileformat to use to load data for high performance computing

This line is essential, thanks.

However, it seems like Kudu isn’t just a file format, but rather a distributed storage - all examples I’ve seen require specifying “kudu.master” and “kudu.table” options instead of file path. If this is correct, Kudu looks out of scope of this discussion (although may be in scope of Spark.jl). Have anybody used Kudu in practice to confirm or deny my assumption?

As for Carbondata, their integration with Spark breaks JSON support. I think I will wait until it gets more stable before including it into Spark.jl.