the following program works amazingly fast for my data (0.3 seconds) but there are some incorrect rows in the output, though it works fine for MWE, any idea?
Nice package & benchmarks, one thing that those benchmarks doesnβt show is how memory crazy polars is. Recently I have benchmarked 1e8 case on my 16G mac and DataFrames was about 2ice faster than polars! how? simply because mac allows using hard as memory and because polars was very hungry for it, it needed to write and read from hard frequently.
good job, keep at it.
Congratulation on your new package, I was monitoring JULIA for a while, and in comparison to βR dplyrβ its packages was hurting from the lack of features. I am glad to see it is changing. good luck.
One of the key benefit s of having multiple packages in one area is competition. This is great for Julia ecosystem. letβs face it DataFrames.jl has been out there for many years but it still lacks lots of features (comparing to all competitors including IMD), IMD may wake DF developers and force them to shake things up. The point is that at the end the whole Julia ecosystem will enjoy the benefits.
IMO instead of working on a competitor it would have been of more benefit to just work on the missing features. In a world where developers are working for free and arenβt massiv in numbers competition isnβt necessarily the best strategy for creating the best result for a ecosystem.
(This isnβt a critic on IMD, because the creators decide as they wish, I respect that, no bad thoughts, just the point that competition is not what the Julia ecosystem needs right now)
Fair competition is good and it can bring diversity and diversity always works and should be welcome. Look at the surface of the new package Iam feeling lots of cool stuffs -many of them seems unique to this package - is included in the new package which can attract new users to start learning Julia. Beside I think fresh rewriting of old packages sometimes work better than patch the packages with new features. Because the old packages usually written when Julia wasnβt mature and they are written for filling the needs at the time and now with a mature language writing up from the scratch may work better.
The byrow function is a stand-alone function with byrow(ds, fun, cols) as its general syntax. You may use ?byrow to see a general documentation about it and use ?byrow(fun) for specific documentation of byrow(ds, fun, cols), e.g. ?byrow(sum). In your code
you are modifying ds by modify! and pass it as the first argument of byrow, however, your code is missing the second argument of byrow, thus you have a syntax error there. I recommend using the Chain package for having a better structure of the operations that a user does on a data set.
The only time that you can use byrow without ds and cols arguments is inside the modify/! or combine functions, since those arguments are derived from the modify/!/combine arguments.
g1 = repeat(1:6, inner = 4)
g2 = repeat(1:4, 6)
ds = Dataset(g1 = g1, g2 = g2)
modify!(ds,:g1=>byrow(x->x==4 ? - 4 : x)) # here ds is modifying and if you want to call byrow on it use byrow(ds, fun, cols) syntax
In the documentation of the groupby function it says that you can use all the kwargs of the sort function and so I thought I could also use; by which is a kwarg of the Basic sort function though.
Donβt you think with your attitude Julia itself hasnβt been created in the first place?
BTW AFAIK Juliaecosystem is full of competitions: look plotting for example.
Itβs strange Polars uses so much memory.
Isnβt Polars internally a Rust library and isnβt Rust supposed to use memory more efficiently than Julia?