(also in reply to @pdeffebach ): thanks! I tried to fix it in the original post, let me know if it’s still wrong…
Ah wait it still won’t work since
:x * "_cleaned" will throw an error. But interpolation will work instead of
"$(var)_cleaned". In general, when working programmatically it’s best to work with strings.
String(x) * "_cleaned"
Thanks, should be good now (I just had this crazy idea of actually testing my code )
I love the speed of it and the concise syntax seems natural to me…
correct me if Im wrong
Isn’t dplyblahblah just a piping with some weird function names?
- A good package manager which saves the need for Docker or Conda
- Consistent base language. Less need to learn all kinds of edge cases
- Powerful metaprogramming and introspection tools (macroexpand, @edit, etc)
- Most libraries are written in pure Julia so it is reasonably easy to checkout the implementation if something doesn’t work
A decent type system and the ability to avoid “big switch” patterns on functions with potentially different type arguments (through multiple dispatch). This makes it easier to write cleaner, generic code through functional composition (building functions out of smaller functions). Once you get good at this, you can write really robust code that’s easy to extend.
In addition, when compared to Python, Julia’s built-in types have a good enough set of objects to handle all kinds of math. This means that ALL of Julia works well with vectors/matrices in what amounts to “numpy” arrays. A lot of Python doesn’t work well with Numpy so you can get things like “I can’t write this dict as JSON because I don’t understand Numpy’s Int64”. Not to mention with Julia’s data frames, you get much saner missing value handling for non-numeric columns.
This is incredibly important. Being able to understand and adapt performant code from the packages that I use is just something I’ve grown so used to now that I just take it for granted!
Would you mind explaining in plain language how the Julia package manager is “easier” than, say, the package manager in Python and R?
I ask because in R and Python, I occasionally discover I’m using old or obsolete packages, update them, and that’s that. In Julia, what happens is as you update one package, a bunch of other packages are downgraded. And while I’m still very new to this, I’m not sure that tweaking manifest files is what I would call “easier.” Now maybe it’s just that the downsides of the R and Python package managers are hidden from the user’s view and I’m perfectly oblivious to some insane things going on in the background in R and Python. I understand a huge amount of work goes into the Julia package manager, which involves very complex processes, and I don’t think I’m currently leveraging its capabilities at all. Thanks!
I ask because in R and Python, I occasionally discover I’m using old or obsolete packages, update them, and that’s that.
This is true for a lot of packages, but try resurrecting very old, unmaintained code that uses packages that basically don’t exist anymore. I had a situation at work with R where everything was frozen at 3.2 (or something) before I came in, because updating packages broke everything and the person in charge didn’t know how to deal with that, and the current version was like 3.6. I came in, tracked down as many of the issues I could and updated to newer packages. It wasn’t that complicated at the end, but I think that the package manager and the environments in Julia really help with this issue. I know that R has some tools to help with this, but Julia’s manager is really simple.
My naive impression is that Manifest files are a fantastic feature for reproducibility, but that the regular user should just ignore (delete) them. Without one, most times
up Package is just what you need.
Yes, that’s a good point that I sort of forgot about. It’s true that most of my R and Python code needs to be updated with every season and it would be too much effort to try to get the old packages, while in Julia if you have written a good manifest file, you can just run the old code and replicate properly. Yes, good point. Is this the main strength of Julia [edit:]'s package manager?
Yes very true, I was simultaneously replying to Alejandro that indeed this is a great feature. And I plan to use it once my projects are mature (but I’m still at the point where I’m better off updating my code as the packages I use get updated).
I don’t think it is the main strength of Julia, but it’s definitely something that helps a lot. To me, personally, the fact that most libraries are also written in Julia was one of the main advantages I found. I remember trying to track down how some algorithm worked in the lme4 package in R and hitting the C++ (Rcpp) wall pretty quickly. Then I found the GLM package in Julia, written by the same person as lme4 and I could see exactly how the algorithm worked. That was something that really motivated me to switch, even though a lot of the things I was doing were fast enough in R.
While I’m usually very curious how things work, and therefore enjoy browsing julia package source files, I acknowledge that a lot of people in my field have no interest in that layer. Sometimes I think, wouldn’t it be better if we all really understood our fmri analysis algorithms and could tweak and modify them in Julia? But then again, where does it stop, you have to draw the line of understanding tools somewhere and start using them or you have an infinite rabbit hole. Most researchers I know have enough work in getting everything on the surface layer to work, so they aren’t enticed by the opportunity to delve deeper. It’s very different for all the numerical people here.
To be clear on this, a manifest file (i.e.
Manifest.toml) is not something you write but is automatically created by the package manager (whether you want it or not, it’s an integral part of how it works). The usefulness of the manifest depends on how you manage your environment though.
5 posts were split to a new topic: How to use Manifests for a clean environment for reproducibility
I do research in biochemistry, and one common thing in this field is to measure interactions between biological molecules. The principle is simple: you mix a constant concentration of molecule R (the “receptor”) with increasing concentrations of molecule L (the “ligand”), and for each of these titration points you measure some signal that varies with the fraction of molecule R that is bound to a molecule of L (this signal is often some property of a fluorescent tag attached to one of the two molecules, because we have sensitive instruments to measure fluorescence). The data you get out of such an experiment is a series of concentrations of L, and the corresponding measurements of the signal that reports on the binding of the ligand to the receptor. This signal is related to the concentration of ligand by a known, non-linear relationship, and fitting this equation to the experimental data gives you the equilibrium dissociation constant.
People typically perform this fitting in programs such as Origin or Graphpad Prism. One can even use Excel with the solver extension. The experiment is simple enough to set up so that one can collect up to a couple dozen datasets in a day. This is almost nothing, of course, but already very tedious if you have to fit them one at a time in a spreadsheet program (I know in Prism at least, one can fit them all in one go; but you still need to enter all the data with as many copy/paste operations as you have datasets). Back when I ran a lot of these experiments, I became obsessed with automating the fitting part. So I tried to use R for that, in part because I wanted to learn it and had an excuse, in part because it seemed well suited to the task of ingesting a bunch of CSV files and spitting out a plot and a number for each. This sent me down a rabbit hole of a year fiddling with R (not full time, thank goodness; this always was a toy project on the side), during which I realized that the most natural way to approach this (at least with tidyverse) was to merge all datasets into a single, giant dataframe (adding a column to keep track of an identifier for each dataset) and do the split-apply-combine trick for the fitting and plotting procedures. I also realized that it was pointless to try to automate the data ingestion, because the instrument I had access to would already generate differently formed CSV files depending on user-defined parameters in the various measurement protocols. Anyway, this resulted in about a thousand lines of code (not counting comments and doc) that barely did what I wanted. I had more ideas that turned out too tedious to implement or did not materialize in a very useful way. A seemingly simple one was to plot error bars when the input dataset contained replicate measurements, but I could never figure out how to do this. Another one was using the equation the opposite way: instead of fitting data to find the value of the dissociation constant, simulate the outcome of an experiment for a given value of it (this is helpful for designing experiments, when you expect the dissociation constant to fall within a certain ballpark). This kind of thing is much better with interactivity (change the value of the parameter and watch the plot changing), but with R that meant trying to make a Shiny app, which meant learning how to use a big library plus a lot more coding, and I had no more patience for that. So I archived the repo and moved on, admitting to myself that it was a fun learning experience but would probably never yield a tool that would be polished enough to use for my work.
Two and a half years later, I discovered Julia and Pluto. This immediately made me want to try to implement this curve-fitting program again. In only a few days, I had all the functionality plus all the niceties I wanted but couldn’t implement easily with R. Interestingly, these two Pluto notebooks also amount to about a thousand lines of code; but this time, including text and comments, and with a lot more functionality (the curve fitting notebook can even read input datasets from URLs, just because it was easy to do with Julia’s libraries). I don’t think this difference in the amount of useful work I got done has much to do with my ability: if anything, I had a lot more mileage with R by the end of my toy project than I have now with Julia, so I would have expected to struggle more with Julia as I was less familiar with it.
An important difference between these two experiences with R and Julia, as someone who writes code when needed but don’t do it for its own sake, was that using Pluto to build a working solution interactively was a lot of fun, and seeing the solution take shape as I was messing around was very motivating.
If I may suggest, export the notebook to a static html and use it as the GitHub page of that project. With that we can see and learn from it without installing or running anything. (Using Pluto exports as “blog posts” is a great idea, IMO).