Issues with CSV in Julia 1.6.2

Data science is absolutely a core staple of Julia. Baking it into the Julia Language itself would be terribly stifling to its growth and progress.

What’s happened here is a core dependency of both CSV and some IDEs released a breaking change. If you’re using an older IDE like a Jupyter notebook or Atom, this causes trouble. It’ll be resolved in a week or so as everyone gets on board with the new version. In the meantime, there are ways to ensure both are using the same version.

Lots of effort has gone into making sure VS Code and Pluto both manage their dependencies separately from the code you write. This means the Julia process that manages the IDE itself is independent from the Julia that you run, completely avoiding these issues. It would indeed be great if a similar approach could work for Jupyter notebooks, but I’m not an IJulia dev nor do I know what’s required to do this.

8 Likes

One can see a pattern here. DelimitedFiles is built into Julia (an stdlib) and isn’t progressing at all and can’t be used with DataFrames. CSV has progressed a lot and can be used with DataFrames.

Thank you for mentioning that you observe this in Jupyter, that clears it up!

IJulia itself depends on JSON, which brings in the Parsers problem that’s been observed in the other threads. Since in this case (as @mbauman mentions) the dependencies for the IDE (IJulia/Jupyter) are not seperated from the regular code, you get the same kind of error as observable in Atom (which has the exact same problem, as I understand it).

Unfortunately, there’s no one party that could see this coming - if anything, it’s a legacy problem stemming from the approach both Atom and Jupyter take for loading code, not directly a problem of Parsers releasing a new version.


If I’m not mistaken, the solution should be to pin the package (]pin Parsers@v1.1.2, possibly followed by ]up, not 100% sure) in whichever environment you’re starting the IJulia kernel from (before using IJulia or starting the jupyter kernel).

4 Likes

I just wanted to add that Julia is not a software package. It’s a language with standard libraries, which one could count as a software package, yes, but most functionality comes in third party packages. There is just not really a “governing body” that might have messed up here or whose priorities need to be straightened out because they don’t take csv import seriously.

It’s simply collateral damage of the normal process of updating and versioning disparate pieces of software. Note that if you’re at the forefront of versions, you’re bound to have issues like this once in a while. That’s just due to the complexity of the ecosystem, especially because the Julia ecosystem has so much code reuse and composition, where one tiny issue in a dependency can have ripple effects.

If you use Julia for courses, you can’t go wrong in setting up environments with known good versions, then distributing them to all your students. There might come a time when there can be made no more substantial improvements to CSV and DataFrames, so people won’t have to care anymore whether they’re at the bleeding edge or not, but this time is not here, yet.

3 Likes

OK, thanks for your thoughts. I guess I am assuming that there is a community that would like to see Julia prosper and grow, giving my thoughts from that point of view.
[BTW, I did manage to fix things with CSV by fiddling with some of the comments above.]

2 Likes

Please don’t be overly incendiary. This is a very engaged community who wants to see Julia prosper and grow, and you know this. You’ve previously asked how to gear your courses for success and gotten lots of engagement — and lots of suggestions on how to avoid surprises by distributing a stable and known-good environment of specific package versions to your students. I know we can continue to do better, but that’s a really good way of making sure things like this don’t happen to you or your students in the first place.

Very glad to hear you’ve resolved this!

15 Likes

Strictly speaking that is not what the DataFrames docs say.

1 Like

I always post with respect for what Julia and the Community is. That said, I do like to convey a sense of urgency when I see it, and I know from experience that in any venture, getting the basics right for the simple things is a “make or break” matter.

I want to see Julia prosper and grow, but I know that will not happen if bugs like this (just simplest reading/writing, standard/common configuration) persist. I have expended quite a bit of energy trying to argue the case for Julia in my institution (with some success), but when students get frustrated by a fundamental unreliability of something very basic like this, it wholly undermines my case for Julia being a sufficiently mature language to adopt, and (in my opinion) at this stage, I find this somewhat alarming, to say the least.

In my opinion there has to be priority placed on the reliability of the basics, no matter how elegant the project wants to ultimately be, or it will have very little chance of succeeding in the long-run.

Another workaround (which I actually like as the main developing workflow for jupyter notebooks) is to use the Jupyter extension inside VSCode, since it does not rely on IJulia, it is able to downgrade Parsers and it works. You can open your notebooks there.

6 Likes

That sounds like a neat idea. I wasn’t aware until this issue arose that IJulia could break dependencies. It also hadn’t occurred to me that the VSCode approach to Jupyter didn’t require IJulia.

Thanks.

3 Likes

As many folks mentioned already, we generally don’t have these problems in the Julia extension for VS Code, because we load all the packages that the extension needs to function in a non-standard way, so that user code can then load different versions of the same packages and they can co-exist.

That in principle is also true for the new native Jupyter Notebook feature we have in the extension, BUT, I just give a fair warning that this notebook feature in VS Code for Julia is still a bit experimental and new. Because we don’t think we have ironed out all the corner cases yet, we do not enable the feature by default, instead you need to change a configuration setting to get access to the native Jupyter Notebook feature in the Julia extension.

I had originally hoped that we might be able to move out of the experimental mode very soon, but right now it looks like it might take another cycle of upstream releases by VS Code itself (we are waiting for some APIs from them to finish the implementation).

6 Likes

Oh, and one other point: there is also CSVFiles.jl. We use that extensively in my research group and orbit. The package moves relatively slowly in terms of new features etc, for us that kind of stability is important, so that might be another option to try.

2 Likes

I would like to address the “general” point the OP raised, as someone who has worked in the software industry for over 40 years, using countless commercial and open source products across many areas. Here are some rules of thumbs that apply across all software of any type.

  1. Rule number one of using software is don’t ever upgrade your production environment to the latest version. First test out compatibility in a non production environment.
  2. When you run across a bug that interferes with your work, spend a bit of your time narrowing down the problem to a small, well documented and replicable case. Try to understand the source of the problem yourself. For commercially supported software this will save you a ton of your own time when working with the paid support team. For open source software, that’s also true but there is an additional reason. Open source projects are usually free and encompass the hard work of many, often unpaid programmers. You need to be respectful of the time of those developers, a basic requirement of human decency.
  3. Successful open source projects meet 3 criteria: 1. they provide an extremely useful function or set of functionality addressing a problem set faced by a large group of people 2. they take an innovative approach to solving the core problem set, both technologically and in terms of ease of use 3. They have a highly engaged community that builds a large ecosystem around the core tools, consistently adds new innovations and engages with and educates newcomers

I have only recently started learning Julia, and everything I’ve seen so far indicates Julia gets an 11 out of 10 on all three of these criteria. And on point 3, I would add the Julia community is the least divisive and most supportive of any of the new communities I’ve joined over the past 10 years. From past experience, I have a high level of confidence Julia will continue to grow and prosper.

24 Likes

Hi AronT,
I totally agree, however regarding your first point:
As a new user I find that Julia is pretty happy to update packages and I find it challenging to revert back to the environment I had before (that is, unless I have backed up my Manifest-file, which requires both some proficiency in Julia and some back-up discipline). I am experimenting with using the option preserve=Pkg.PRESERVE_ALL when adding packages, however I must admit that I don’t yet understand the nuances of the preserve-argument.

Perhaps Julia’s default behaviour could be more focused on maintaining a stable environment for new users?

1 Like

I find it challenging to revert back to the environment I had before

Does Pkg.undo() not work for you?

(@v1.7) pkg> ?undo
  undo

  Undoes the latest change to the active project.
4 Likes

Hi @nilshg ,
Thanks for the suggestion, I will try it out! :slight_smile:

PS. In my first try, just now, I am not sure exactly what it undid. The package I added as my last command was for example still in both the project-file and the manifest file.

If there’s anything that doesn’t look right it’s always best to come up with a reproducible example. For package operations I find it convenient to do ]activate --temp to create a throwaway environment, and then step-by-step doing all the package operations which lead to the surprising outcome. There can always be bugs in Pkg or surprising results coming out of the resolver, so even something seems off it’s good to investigate.

This is of course harder if it’s some old Manifest that you have trouble with, in which case I would probable Pkg.instantiate() that Manifest (if you have it backed up!), then up or add or whatever it is you want to undo, and then undo. If that doesn’t work then it’d be good to share your Manifest when reporting an issue so that others can reproduce.

1 Like

@nilshg,
Unfortunatly, I cannot seem to reproduce the odd behaviour I saw the first time, I will report it if it happens again! Thanks for your help and great suggestions!

2 Likes

As noted I’m far from a Julia expert. But besides what other people suggested, I can suggest the best way to backup your code and manifests: git along with GitHub or Gitlab which have free versions.

Using git even if you work alone has many benefits, including the ability to revert to working versions and backup via remote providers. When you have a working version, commit everything to git and push to Github/Gitlab so you have a backup. If things get messed up you just use git to revert to the manifest or code file which you know worked. You can easily trace differences in code to find where you went wrong. There are lots of online resources to learn about Git and it’s not hard to learn the basics. I strongly encourage you to try.

8 Likes

3 posts were split to a new topic: Precompile error with DataFrames