The problem here is that the iterator returned by CSV.File is immutable. So each row object that you are iterating over in your foreach loop can’t be altered. In particular, you can’t set the value Splits equal to something.
Could you give more information on what your goals are? Do you want to put the data in memory and work with it? Or do you want to just alter the csv file and save that?
PS here is an MWE for your to use, it shows the way you can input a string into CSV.file
using CSV
file = """
"X1", "X2", "X3", "x4", "Splits"
"5674012","530489692","batch_145322","10/31/2019 15:00:13",
"5674012","530489702","batch_145323","10/31/2019 15:00:32","9b4e08e5"
"5674012","530489728","batch_145327","10/31/2019 15:01:56","b036aa66,b036aa67,b036aa68"
"""
io = IOBuffer(file)
batches = CSV.File(
io;
header = true,
delim = ',')
foreach(batches) do b
if !ismissing(b.Splits)
split(b.Splits, ',')
end
end
@pdeffebach , thanks for sample how to read CSV from string. Pretty useful.
What I’d like to do is to read the CSV into memory and process later (merging with other CSVs, filtering rows based on the column values etc.). And then later to write the result as CSV to disk.
You’re looking for the copycols kwarg to CSV.File although I agree that in general you’ll have an easier time doing manipulation on a DataFrame. But even if you create a DataFrame using e.g. CSV.read, you still need the copycols kwarg if you want to mutate the values later.
Just going through getting started for DataFrames and I’m a little bit confused about versions and that document.
I installed (from Download Julia) 1.4.1.
The version of DataFrames is 0.20.2
(@v1.4) pkg> status
Status `C:\Users\u\.julia\environments\v1.4\Project.toml`
[a93c6f00] DataFrames v0.20.2
...
But this getting started shows version for 0.21.0. So now when running some commands (e.g. select(df, :x1 => :a1, :x2 => :a2) # rename columns) throws exception.
Trying to install the version is throwing errors
Pkg.add(Pkg.PackageSpec(;name="DataFrames", version="0.21"))
Resolving package versions...
ERROR: Unsatisfiable requirements detected for package DataFrames [a93c6f00]:
DataFrames [a93c6f00] log:
├─possible versions are: [0.11.7, 0.12.0, 0.13.0-0.13.1, 0.14.0-0.14.1, 0.15.0-0.15.2, 0.16.0, 0.17.0-0.17.1, 0.18.0-0.18.4, 0.19.0-0.19.4, 0.20.0-0.20.2] or uninstalled
└─restricted to versions 0.21 by an explicit requirement — no versions left
When I go to Introduction · DataFrames.jl , the versions changes to 0.21. So the version obviously exists. How to install it?
I think you need to use update rather than adding the new version. Can you just try Pkg.update() and see if that bumps you to the new version?
I was able to reproduce your environment on macOS and update worked for me. I would avoid adding a package using PkgSpec like that and just rely on the resolver to do things.
Note that DataFrames development is very fast. It probably got updated after you downloaded DataFrames. update is very easy to do in Julia. However note that 20.0.x is robust and perfectly capable of doing data analysis. The documentation (aside from the tutorial) works for it here.
There is an unfortunate amount conflicting tutorials online right now. But that will settle down once DataFrames hits 1.0, which is soon.