I make type mistakes very often using ==
. I wish there were a safer equality operator that would error unless the types of the operands were compatible.
Then now with the 1.1.0 version is also possible to use
subset(df, :x => ==(0))
or only
subset(df, :x => ByRow(==(0)))
?
Only
subset(df, :x => ByRow(==(0)))
The following
subset(df, :x => ==(0))
is a somewhat meaningless comparison and can lead to hard to catch bugs, which is why it’s disallowed.
Or you can write filter(:x => ==(0), df)
. This is the crucial difference between filter
(which works on a element) and subset
which takes a whole vector.
We have discussed removing filter
support, as its syntax is inconsistent with the rest of the DataFrames.jl minilanguage (as normally :x => fun
means passing a whole vector to fun
), but the use-case we discuss here is frequent enough that we decided to keep the inconsistency.
Agreed. I started adding typeassert
to be sure
typeassert(id, eltype(df.id))
only_id = filter(:id => ==(id), df)
but this isn’t a DataFrames problem, I think. The equality is from Julia base.
Is there any “cheat sheet” newer or more complete than this one?
https://www.ahsmart.com/pub/data-wrangling-with-data-frames-jl-cheat-sheet/
It uses dataframes.jl v0.22.
@Juan - it should be mostly OK. You could open an issue on https://github.com/tk3369/www.ahsmart.com to ask for update. Thank you!
I am trying to update to DataFrames 1.1.0 using
Pkg.update(“DataFrames”)
and checking the package with
Pkg.status(“DataFrames”) its not getting updated - I am still seeing v0.22.7
Can you please help
Do ] @add DataFrames@1.1.0
and read the error message closely. It will tell you what package is holding back compatability in your environment.
Updating registry at C:\Users\harne\.julia\registries\General
Updating registry at C:\Users\harne\.julia\registries\JuliaComputingRegistry
Resolving package versions…
ERROR: Unsatisfiable requirements detected for package ScikitLearn [3646fa90]:
ScikitLearn [3646fa90] log:
├─possible versions are: 0.5.0-0.6.3 or uninstalled
├─restricted to versions * by an explicit requirement, leaving only versions 0.5.0-0.6.3
└─restricted by compatibility requirements with DataFrames [a93c6f00] to versions: uninstalled — no versions left
└─DataFrames [a93c6f00] log:
├─possible versions are: 0.11.7-1.1.0 or uninstalled
└─restricted to versions 1.1.0 by an explicit requirement, leaving only versions 1.1.0
Still
Pkg.status(“DataFrames”)
Status C:\Users\harne\.julia\environments\v1.6\Project.toml
[a93c6f00] DataFrames v0.22.7
After removing the ScikitLearn package - DataFrames package was updated using your code. Thanks a lot for help
Congrats for reaching version 1.0!!!
It’s a major contribution and an important step in bringing consensus to the “Is Julia production-ready?” dilemma as for ‘general’ data science utilization!
Thanks!!!
I am upgrading to DataFrames 1.0 this weekend.
Previously, leftjoin(df1, df2, on=:key)
resulted in a DataFrame with rows ordered the same as df1
. I know it was documented that this could change, but I also bet I wasn’t the only one that had code relying on it.
To the others who were relying on it, what do you do now? Make an index column and sort!
?
Could you please elaborate on this in https://github.com/JuliaData/DataFrames.jl/issues/2753. I would add the kwargs I discuss there relatively quickly when we reach a consensus what options for what joins we want.
For the interested people (as many ask about it) in The state of DataFrames.jl H2O benchmark - #14 by bkamins I have summarized the conclusions from the latest H2O benchmark.
DataFrames.jl 1.2.0 is out. Here you can find the release notes. I have also written a blog post explaining the key user visible changes it introduces.
DataFrames.jl 1.3.0 is out.
It is a major release much bigger than recent releases. It is expected that, hopefully, we managed to fix all key missing parts in the package to make it feature complete.
Development towards 1.4.0 will continue by adding additional features requested by the users. I expect to have this release around JuliaCon 2022 (unless something unexpected happens).
Here you can find the detailed release notes. See also NEWS.md for a list of relevant changes in the package.
Let me briefly summarize the most important changes and additions (in total 125 PRs were merged since 1.2.2 release which is a lot) this will be brief so it assumes you know the functionality of the package, I will soon write a blog post explaining these changes for newcomers):
- in
groupby
now users have more control on resulting group order (this resolves the issue previouslygroupby
was implemented to produce the group ordering that is fastest to create by default, which is unintuitive in certain use cases; nowsort
keyword argument is improved and allows more control from the user if this is desired); - if
SubDataFrame
was created with:
column selector (i.e. it contains all columns of its parent) then you can add new columns to such data frame in all functions (the filtered out rows get filled withmissing
value) -
delete!
is deprecated in favor ofdeleteat!
fixing the inconsistency with how what these functions are used for in Julia Base -
leftjoin!
is added allowing for in-place joining of data frames (and it is fast) - in
source .=> transformation .=> destination
form of the transformation minilanguage theCols
,Between
,All
andNot
selectors support broadcasting; - fix a bug in handling of keyword arguments in sorting related functions that in some cases allowed passing tuples (support of which was removed in 1.0 release) and in some other cases lead to stack overflow;
- transformations having a form
AsTable(...) => ByRow(sum)
(and other standard reduction functions) are now fast even when many columns are selected (solving a long standing performance bottleneck) - In DataFrames.jl 1.4 release on Julia 1.7 or newer broadcasting assignment into an existing column of a data frame will replace it. Under Julia 1.6 or older it will be an in place operation. (this is an unfortunate difference in behavior between versions of Julia - it is impossible to implement it differently due to limitations of Julia Base; that is why a clear announcement of this discrepancy is made now and the change will be made effective in DataFrames.jl 1.4)
Before I wrap up let me thank everyone who contributed towards this release!
Hi, I didn’t notice the announce for the underlying change in Julia 1.7. Can you give me a pointer?