Thanks for the this, I’ll have to play around with it . Admittedly, there’s still a lot of capabilities of the new Pkg that I still need to explore.
I’m very much in favor of the fancy compiler roadmap and get that it’s not amenable to a stable upstream LLVM (but let me hold on to my vain hope that LLVM will happen to be relatively bug-free in the areas y’all are pushing it). I still think it hurts adoption some (while probably helping it in other areas) and hurts even more for Julia being perceived as a language with applications outside of scientific computing. There’s always tradeoffs.
As for why my ~/.julia is so large, after running ]gc, it’s still 2.5 GiB: 423 MiB from GDAL.jl which mostly comes from the library it’s wrapping, 141 MiB from CMake (looks like there’s two versions there), 175 MiB from DifferentialEquations.jl, 115.8 MiB from Plots.jl, and the rest are relatively small. Frankly given the sizes I’m seeing on DifferentialEquations.jl, Plots.jl, and CMake.jl, it’s amazing to me that you’re at only 60 MB.
!!! After running ]gc my usage went from 25GB to a svelte 24GB. I assumed everyone’s .julia tree was huge. That includes 12GB for Julia 0.6, which I still use a bit (Julia 0.6, 0.7, to find replacements for deprecations, syntax changes, etc.) 6GB in packages. 2.4 for Conda, 800 MB in dev…
!!!
See, predev (some dev cruft) is the first subdirectory under 60 MB:
I’ve been doing a ton of data crunching with heavy emphasis on reading and parsing of text files which are 10s of MBytes. I even finally got around to using DataFrames for the first time (yes - it’s great).
Mine has “only” 1.7Gb and I care to manually clean many of the source files left over by the installers. GR behaves nicely and removes all non-needed packaged files after install but many other do not do it.
Disk space is not negligible as some pretend. Try to work on a laptop with a SSD and see.
Julia’s Cmd is awesome. It’s very exciting that you are planning to turn it into a mini shell DSL. I also think that subprocess API can be improved further by stealing some high-level APIs from Python subprocess because some of them are actually nice. For example, sending data to subprocess stdin and reading stdout at the same time without a deadlock is somewhat non-trivial in Julia at the moment while Python has Popen.communicate exactly for that. Also, relatively new Python has subprocess.run which makes it possible to do this in one line.
We somehow need to allow repo cleans. DifferentialEquations.jl being so large is crazy unnecessary since all of the things that made it large are no longer in the repo.
Good reminder. That’s one of the reasons I want to get more programmers who work on different kinds of problems involved. Traditionally, (and perhaps this is changing) scientist aren’t know for writing generalized, reusable code. If the package ecosystem is to flourish, programmers will be helpful to have along.
You’re discovering the dirty secret that I don’t actually use Julia for anything because my job is to develop Julia, not to use it. But there’s also the fact that in 1.0 it’s so easy to reproduce a set of packages exactly, so why bother keeping them installed? If I need to run something I can just instantiate its manifest and I’ll be back in a working state in a few minutes tops.
I don’t think that people explicitly choose to keep things installed, rather cruft just builds up. I guess one could clean it up periodically, but as long as hard disk space is cheaper than labor, it’s only worth it in egregious cases.
I was surprised by
$ du -sh ~/.julia/conda/
1.4G /home/tamas/.julia/conda/
I’ve also just realized that my .julia folder is quite large and it seems like most of it is due to Conda (sometimes Pkg keeps more than exactly one version of the package so Conda alone can take a few GBs of space). Would it be possible to somehow not have it as a dependency of IJulia?
Also, there is a snap for it, which is available wherever snaps are sold (almost everywhere, these days) not sure how up-to-date it is.
Anyway, I don’t think availability is the problem at all, if we’re just talking about getting software developers interested. It might be a challenge for inexperienced programmers, but nobody who does this stuff for a living is going to have trouble getting started with Julia.
In developer communities, Julia is known as a sort of faster alternative to Numpy/R/Matlab and people don’t know it’s a general-purpose language. I’ve had the conversation several times in real life and many times online. For most people (including many in this community), Julia is synonymous with numeric computing. For me, it’s such a waste that people don’t realize how practical Julia is for so many different types of work.
The other main thing that keeps people away, at least from my conversations, is that a lot of Java-first developers seem to have trouble understanding how to use dispatch on abstract types for code reuse rather than inheritance. (mumble mumble Java developers mumble mumble) edit: I do not mean to imply that Julia should try to accommodate them. What we have is better, and if inheritance is added to the language, people are going to use it, and we don’t want that.
Of course not, which is why I didn’t say that it was. It’s still important even if not “biggest”.
Remember that this entire thread is taking place in the context of changing the narrative that Julia is only for technical computing. Personally, there have been several times when I wanted to write a script to send to someone else, and my first instinct was to reach for Julia, but unlike python, perl, or bash, I cannot expect Julia to be readily available on the other person’s computer, so I ended up just writing it in bash instead. None of this is a problem within the domain of technical computing, but outside of it the ubiquity of being able to just send a simple script or a statically linked portable binary becomes important; distribution becomes important. Packaging a python script as a standalone program is if anything more difficult than doing the same for Julia, but it doesn’t matter because python is available in every distribution (and usually part of the base system). That’s kind of the crux of the argument I’m making.