What do you work on? Why is it important?

dfdx · May 30, 2021, 8:24pm

In a recent conversation with a friend he asked me why I spend so much time on Julia ecosystem. He generally liked the language and agreed it’s an excellent tool for many technical tasks, but he expressed doubts that Julia - given its current state and the state of other languages like Python - is the optimal solution for the real world problems with high impact on the future of the human race.

Being an ML engineer, I pointed out that many unconventional problems (say, Deepmind’s Alphafold) require not just standard blocks available e.g. in Python libraries, but also a lot of custom code. And Julia is the perfect tool to write such custom code efficiently. I know Julia libraries are steadily catching up Python counterparts, but Python is unlikely to ever become as fast as Julia.

Perhaps this is the main reasons I personally put my time into Yota / Avalon stack. I believe these and connected packages (NNlib, ChainRules, ONNX, Zygote, Flux, Knet to name a few) will eventually form a basis for an efficient and flexible ecosystem with much more extensive capabilities for AI than any other programming language. These new capabilities in their turn will enable new applications in bioinformatics, robotics, agriculture and so on, so it’s very much real world impact.

Do you have a similar story? In particular:

What do you work on?
Why is it important? What problem - smaller or larger - does it solve?
Why do you use Julia for it?

xiaodai · May 30, 2021, 11:50pm

It’s a time thing. Python was once not as popular as Perl. But things slowly came around for python as it’s easier to code with. Julia is easy to code and can be faster but TTFP is an issue (getting massively improved!).

Also, there’s a study that ppl who use Chrome or Firefox tend to have higher levels of productivity. The fact that they looked for a better tool than IE is predictive of certain traits, e.g. looking for better ways to do thing.

Python is now the “default”, so ppl who look for other better tools for their domain tend to have higher productivity by inference. So yeah, I wouldn’t worry too much about it and let time do the talking. Meanwhile, I will make a headstart making packages work.

dfdx · May 31, 2021, 8:26am

Totally agree! Yet this post’s goal is not just to reiterate that we are doing fine, but to share the awesome projects out there. For example, being interested in bioinformatics, I was amazed at how clean and easy-to-use projects in BioJulia are (though it’s hard for me to compare it to Biopython). I couldn’t pass even the installation step for PyTorch Geometric providing graph neural networks to PyTorch ecosystem, but GeometricFlux worked for me from the first attempt. And with projects like Turing and Gen.jl probabilistic programming becomes much more attractive.

The significance of these projects is obvious to me. These packages have been announced or somehow else mentioned in conversations. But there are also thousands of packages I’m not aware of or can’t estimate their significance. Not knowing these packages means I don’t contribute to them, don’t build integrations, and so on. Great things are born from the collaboration of many people, and collaboration starts from getting to know each other.

jonathanBieler · May 31, 2021, 9:05am

I mostly do bioinformatics. The standard in the field is to pipe together a bunch of heterogenous tools, which works fine for simple tasks but can quickly become a mess. There’s also a big divide between tool developers (often written in C/C++ & Java) and users.

Julia is a good alternative because you can write end-to-end custom tools easily, and BioJulia laid some good foundations for dealing with the various data type involved. New sequencers can outputs tons of data (20 billion sequencing reads on a NovaSeq run) so performance and scaling are becoming more and more important. The Julia ecosystem isn’t as fast as it should be for bioinformatics but hopefully it will continue to improve. There’s not a lot of developers & users though.

gustaphe · May 31, 2021, 2:40pm

I thought this was examples of defaults, and was really surprised, until I saw you meant in contrast to IE. I don’t think I’ve ever met anyone who used IE.

dfdx · May 31, 2021, 7:45pm

I’ve heard this sentiment several times recently, especially in the context of the low salaries of software engineers in the field. Is there anything the community can help with? I’m personally fascinated by bioinformatics and would happily spend some of my time to improve the ecosystem, but e.g. hanging issues in BioSequences.jl seem to be too involved for an outsider to get started with.

GunnarFarneback · May 31, 2021, 9:21pm

I work with machine learning in medical imaging. I frequently speed up data processing tasks substantially by converting them from Python to Julia. Partially this is because Julia is simply faster and partially because it’s much easier to optimize the code in Julia. Some of those optimizations would have been possible in Python as well but impractical enough that nobody did.

The least interesting part from my point of view is the deep learning libraries. PyTorch can do what we need, is comfortable to the whole team (not all speak Julia that well, if at all), works very well with PyCall (*), and the speed is to a large extent determined by CuDNN. The real potential for Julia is in the parts that happen outside of the library, not least in data loading, where we have a very custom setup. We have also designed the training environment in a way that allows replacing parts of it with reasonable effort and some modules are already dual Python/Julia packages. Should there be a future reason to switch to some other library than PyTorch, be it in Python or Julia, we can do so relatively easily.

I’ve been using Julia for a long time and have realized that the key to acceptance, apart from better performance, is reliability. Much of my public contributions over the last couple years have been focused on leveraging improvements like Pkg3 (i.e. the current generation of the package manager, for those who are new), artifacts, and package servers for use in a company setting, with infrastructural packages like LocalRegistry, LocalPackageServer, PackageCompatUI, and helping to make subdir packages a reality. I need our Julia code to just work, every time, for everybody, and reproducibly.

(*) Anecdote: The first time I made our training environment run from Julia, basically PyCalling nearly everything, it magically became 10% faster. After digging around for a while we realized it was a side effect of some data needing to be copied in the interaction between Julia and Python and we managed to reproduce it in Python by a strategically placed numpy.ascontiguousarray just before passing the data to PyTorch. That’s an optimization we would never have found if we hadn’t tried to use Julia. (No, this is not relevant to most people and was an artifact of our complex data loading, so it was maybe more of a fix than an optimization. Still it brought us 10% extra performance more or less for free.)

lmiq · May 31, 2021, 11:08pm

There are many responses here:

xiaodai · May 31, 2021, 11:15pm

u might be too young or have not worked in a lockdown (i.e. not allowed to install software/apps) corporate environment in the last 10 years.

gustaphe · June 1, 2021, 4:31am

Yeah I don’t think I’d take a job in such an environment.

xiaodai · June 1, 2021, 4:34am

it’s easier if u have choices and don’t jump at the first big company role thrown ur way… yeah. i wish everyone’s in ur position though…

StatisticalMouse · June 1, 2021, 5:49am

A side-step, but why is the package Yötä? It’s Finnish and means ’night’, as in ’good night!’.

Tomas_Pevny · June 1, 2021, 6:31am

To chime in,

I really like to do ML in Julia because of the transparency. The fact that you can add a performant custom gradient easily is so cool (my favourite is a differentiable parametrization of orthonormal matrices in GitHub - pevnak/Unitary.jl: A differentiable parametrization of a group of unitary matrices.) a really important if you like to do what majority is not interested in (which makes it also hard to sell).

BLI · June 1, 2021, 9:11am

I’ve even used Netscape…

Raf · June 1, 2021, 10:02am

I do large-scale spatial ecological modelling, currently in an agricultural/ecological research organization. We model pest and disease risk for agriculture, and do conservation research, among other things.

DynamicGrids.jl is our core modelling tool, along with a bunch of other packages I’ve written to support it. The reason we use Julia was the existing tools were orders of magnitude slower than what we needed. Also, as you mention, we need to write a lot of custom models easily. We also use Julia for loading and processing massive spatial datasets quickly, with GeoData.jl and tools that build on it - for modelling dynamic species distributions. Again this would be hard to do quickly otherwise, as we often need to run custom models over data. All these things also run on GPUs, as a bonus.

Currently I’m doing some work unrelated to DynamicGrids, that also ended up using Julia. To use Circuitscape.jl for large simulations predicting extinction risk for hundreds/thousands of species. This was explicitly moving from python to the Julia, as an anecdote for your friend

The scale of these problems and the available data is only going to grow, and the timelines are only going to get shorter. At the same time parallel/GPU is the only performance growth. So probably Julia is here to stay.

Jeff_Emanuel · June 1, 2021, 2:39pm

Mosaic, FTW

jonathanBieler · June 1, 2021, 3:02pm

I’ve been working on that little package to easily process files. I’ve noticed I was constantly writing the same kind of code (listing files in a directory, managing reader/writers, making a threaded for loop, etc) and that’s supposed to solve this issue, and can serve as samtools/bcftools/picard replacement. It’s quite basic at the moment (not even sure the examples in the readme work), but I think something like this would be nice to have.

I’ve also put together a BioMart package (quite popular database) a while ago but never published it. One issue is that there’s the BioServices.jl package which is supposed to contains things like that, but adding random services into it won’t end up very well I’m afraid and the whole thing might need a redesign. One idea would be to have an AbstractBioServices package that define some common APIs and utility and then packages for particular services can depend on that (looks a lot like I’m building a query DSL here).

Another strategy is to look at popular packages in R/python and port them, e.g. DESeq2, but personally I tend to work only on things I really need.

dfdx · June 1, 2021, 11:19pm

Thanks for all these great examples! Now I feel even more motivated to promote and contribute to the Julia package ecosystem!

Honestly, I’ve forgotten the real reason a long ago, but I like the idea proposed one day on Slack that this is just a metal umlaut to make Yota more formidable. Another plausible explanation is that I just like letter I don’t have on my keyboard

Anil_Joshi · June 5, 2021, 11:22pm

Your friend is right.

ben-schulz · June 6, 2021, 5:43am

[1] professionally, i build custom software tooling and solve traditional-style ( as opposed to “deep learning” ) ML problems for the data science group at a finance company – but they use Python, and i rudely describe Python as “a gross way to live” in meetings, and vocally pine for “a compiled language with an actual type system”

[1a] – but that just pays my bills, and while the finance company is nice, well-intentioned folks, it is difficult to feel as if helping to make money with money is “important” for anyone other than the few folks who the money ultimately belongs to. avocationally, i poke around for civic-minded problems i can try to solve using public datasets – my firmest accomplishment to date has been to assemble a year’s worth of incarceration records from the local county sheriff’s department [ this is in the United States ], then clean and aggregate them into nice, concise reports that illustrate things like systemic racism in incarcerations and mismanagement in the local criminal justice system. i chose to use Julia for this!

[2] i hesitate to brand anything i work on as “important” – as importance assumes a vector of moral valuation and general “worlding”. but i choose to poke around through grimy public datasets and turn them into charts because i know that no one else will. the public sector in the U.S. is brutally starved of resources; the tech departments of local governments struggle to fulfill even the most basic functions, and almost never attract the most capable engineers ( or rather – those who are capable work just long enough to build a resume, and then leave as quickly as they can ). as such, i can be confident that no one else with any sort of technical aptitude is going to work on such a small, localized problem. things will sit and fester in obscurity, and local officials will shrug impotently every time someone asks what’s really going on: “we don’t have the data; it will take so long to build that report, if we even can” ( i have specific stories that starkly illustrate this self-serving incompetence ). in contrast, i’ve watched data science wield massive power within my professional work – when executed with vision and discipline. i do not have the effort or resources of an entire team, but i would like to marshal what power i can, in the service of civic causes that might otherwise get no help at all.

[3] i know that doesn’t seem much to do with Julia, nor does it represent an accomplishment that would dazzle an experienced engineer with its technical challenge or nuance. for what it’s worth, i chose Julia because it just has the feel of a good tool. i did a lot of graduate work in programming languages and compilers, and when i saw what Julia was, i immediate recognized it as something remarkable: a tool that strikes the razor balance between user experience and technical execution. ( i also rely heavily upon DataFrames.jl and Gadfly.jl, both of which have been a delight to use, in contrast to my experiences with pandas and matplotlib.pyplot ) i wanted to use it simply because it does all of the things that i wish Python did ( and that i think most Python users would also wish for, if they better understood the limitations of what they were working with ). they say that “a poor carpenter blames the tools” – and i can say, with confidence, that i have quietly suffered to build quite a few things in Python that i am genuinely proud of, and some of which may even have been technically impressive. but there is something demoralizing about having to hack onward with a bad tool. it can be done, for sure, but it is hard not to feel burdened or even humiliated by it. ( how many runtime checks of hasattr and isinstance must i really write?! ) as i wade through this obscure uphill battle of my avocational work, it lifts my spirits just a little to know that the tool in my hand is powerful and technically sound, that even my humble task is worth a tool built with care.

[*] - i mean no insult to any technically competent civil servant out there; i’m sure they must exist, but they also appear to be vanishingly rare. if you are that person, or know that person, then i apologize for any offense, and offer my utmost respect

Topic		Replies	Views
Why Julia - A Manifesto Community	38	5095	December 7, 2023
Results regarding Julia from HackerRank developer skills report Community	26	3285	January 28, 2018
Losing Science Marketshare to Non-Python Languages Offtopic	30	3037	September 21, 2022
How do we Julians win big when the situation is so unfair? Community	100	5135	November 22, 2023
Why is Julia so great? New to Julia	77	10776	April 16, 2023

What do you work on? Why is it important?

Related topics