Thanks for your post about JuliaDB. I found it really helpful.
First, I am not sure if I am asking too much, I personally think when there is no risk of messing things up, to be able to refer columns by variable name is easier. For example, I think x is easier than :x or df.x or _.x, since we are doing all the data manipulation within a DataFrame (even when we need to merge multiple data frames, we can still do this if we do it in a proper order), the name of a column is sufficient to show what is going on. When there is confusion, just use :x.
Second, I am not a Computer Science guy, my personal understanding is that we use data frames so we do not need to work with arrays and all kinds of loops. My 2 cent opinion is that maybe it is easier to avoid for or while or do when working with data frames.
Third, the integration of time series functions (moving average, moving standard deviation, etc.) and data frames is also important.
Fourth, the integration of reading data, manipulating data, and visualizing data. In R, we can do something like this,
df %>% fread() %>%
filter() %>%
mutate() %>%
group_by() %>%
summarise() %>%
ggplot()
Fifth, a small point. I just noticed that the @transform in DataFramesMeta does not allow the use of computed variables immediately afterwards, which makes it less useful as the dplyr::mutate.
using DataFrames, DataFramesMeta, Lazy
df = DataFrame(A = [1, 2, 3], B = [4, 5, 6])
df = @> begin
df
@transform( x = :A + :B, y = :x -1 )
end
ERROR: KeyError: key :x not found
In R, this can be done like
df %<>% mutate(x = A + B, y = x -1)
This result shows that @transform is an equivalent of dplyr::transform instead of dplyr::mutate. See the the comparison table between DataFramesMeta, dplyr, and LINQ on this link:
https://github.com/JuliaStats/DataFramesMeta.jl