This morning I did a standard upgrade to Julia 1.6.2 and loaded the basic packages I use for Data Science. After adding everything, I tried to get started, beginning with the simple statement ‘using CSV’ and got a page of (apparently unresolveable) package dependency issues.
Fact:
Julia 1.6.2 is not (reliably) compatible with any version of the package CSV apparently at the moment (a known issue - thread on Discourse)
What I want to know is how such basic functionality as csv-reading/writing can become broken in a standard/default installation of Julia at this late stage of it’s development!!!
I am posting on here because this seems to be more than just a minor software bug, but the kind of systemic failure (if I may say) that threatens the growth and acceptance of Julia itself, and (speaking as an academic) makes it very hard for me to make a case for using Julia in my teaching and research, which (given Julia’s obvious potential) I see as a terrible waste.
Apologies, but I could not let this go without noting
as the thread in question has a solution - don’t do using JuliaFormatter (or Atom for that matter, as it’s been out of support I believe for quite a while) without having added CSV first.
You’ve also posted in that other thread (the one with an extended debugging session) that you’re not interested in “why it is failing”, even though that’s crucial to solving the individual problem you’re having.
–
Let’s back up a bit:
Are you using a custom startup.jl with using JuliaFormatter?
The ‘solution’ there is not a solution because I had exactly the same problem and the proposed solution did not work. Anyway, that is not my point, which is that Julia should be able to reliably read and write data without having to resort to bug-fixes!
I understand your frustration, bugs are very frustrating. Unfortunately they exist. Sometimes they affect only very few people and we happen to be one of the unlucky ones.
If you want to use Julia, and want help, people here are most willing to have that working, and maybe fix the bug (which may not be the same that the one of that thread) - If you can, just follow what @Sukera is asking, and you may find what is going wrong there.
People seem to know what has gone wrong, but that is not my point here, in a general discussion, which is this seems to be quite a widespread thing that started happening a couple of days ago is already being experienced by alot of people. I am not talking about an esoteric, advanced feature here, but the ability of reading and writing data, which should be sacrosanct in any software package like Julia. I am talking from the point of view of someone who regularly has to assess whether to use Julia in classes
Thanks, I think one can’t use DataFrames without CSV
It would be great if these could be integrated, since they are so central to anything a Data Scientist would do.
Sadly not always. Under unfortunate circumstances (breaking releases of common dependencies with some packages updated and some not) you can run into interference with your startup.jl or packages like IJulia. See e.g. Reverting back to avoid precompile errors? - #21 by GunnarFarneback.
I guess what I wonder is whether it might be formally acknowledged that a large part of Julia’s usage is Data Science, so that certain basic functionality like DataFrames and CSV should be included in Base and thus ‘ring-fenced’.
Out of curiousity, I just added CSV and DataFrames to my Julia 1.6.2 installation, and read in a file, no problems. There may be some order of operations that is not so smooth, but that’s just a bug, and bugs tend to get fixed quickly. Opening an issue for the relevant package is the standard procedure.
(@v1.6) pkg> st
Status `~/.julia/environments/v1.6/Project.toml`
[336ed68f] CSV v0.8.5
[a93c6f00] DataFrames v1.2.2
[5fb14364] OhMyREPL v0.5.10
Data science is absolutely a core staple of Julia. Baking it into the Julia Language itself would be terribly stifling to its growth and progress.
What’s happened here is a core dependency of both CSV and some IDEs released a breaking change. If you’re using an older IDE like a Jupyter notebook or Atom, this causes trouble. It’ll be resolved in a week or so as everyone gets on board with the new version. In the meantime, there are ways to ensure both are using the same version.
Lots of effort has gone into making sure VS Code and Pluto both manage their dependencies separately from the code you write. This means the Julia process that manages the IDE itself is independent from the Julia that you run, completely avoiding these issues. It would indeed be great if a similar approach could work for Jupyter notebooks, but I’m not an IJulia dev nor do I know what’s required to do this.
One can see a pattern here. DelimitedFiles is built into Julia (an stdlib) and isn’t progressing at all and can’t be used with DataFrames. CSV has progressed a lot and can be used with DataFrames.
Thank you for mentioning that you observe this in Jupyter, that clears it up!
IJulia itself depends on JSON, which brings in the Parsers problem that’s been observed in the other threads. Since in this case (as @mbauman mentions) the dependencies for the IDE (IJulia/Jupyter) are not seperated from the regular code, you get the same kind of error as observable in Atom (which has the exact same problem, as I understand it).
Unfortunately, there’s no one party that could see this coming - if anything, it’s a legacy problem stemming from the approach both Atom and Jupyter take for loading code, not directly a problem of Parsers releasing a new version.
If I’m not mistaken, the solution should be to pin the package (]pin Parsers@v1.1.2, possibly followed by ]up, not 100% sure) in whichever environment you’re starting the IJulia kernel from (beforeusing IJulia or starting the jupyter kernel).
I just wanted to add that Julia is not a software package. It’s a language with standard libraries, which one could count as a software package, yes, but most functionality comes in third party packages. There is just not really a “governing body” that might have messed up here or whose priorities need to be straightened out because they don’t take csv import seriously.
It’s simply collateral damage of the normal process of updating and versioning disparate pieces of software. Note that if you’re at the forefront of versions, you’re bound to have issues like this once in a while. That’s just due to the complexity of the ecosystem, especially because the Julia ecosystem has so much code reuse and composition, where one tiny issue in a dependency can have ripple effects.
If you use Julia for courses, you can’t go wrong in setting up environments with known good versions, then distributing them to all your students. There might come a time when there can be made no more substantial improvements to CSV and DataFrames, so people won’t have to care anymore whether they’re at the bleeding edge or not, but this time is not here, yet.
OK, thanks for your thoughts. I guess I am assuming that there is a community that would like to see Julia prosper and grow, giving my thoughts from that point of view.
[BTW, I did manage to fix things with CSV by fiddling with some of the comments above.]
Please don’t be overly incendiary. This is a very engaged community who wants to see Julia prosper and grow, and you know this. You’ve previously asked how to gear your courses for success and gotten lots of engagement — and lotsofsuggestions on how to avoid surprises by distributing a stable and known-good environment of specific package versions to your students. I know we can continue to do better, but that’s a really good way of making sure things like this don’t happen to you or your students in the first place.