Julia solves a two-language problem but the real problem is a 3-wants problem which Julia does not necessarily serve well atm

I think that will require Julia to have some basic built in algebra data type and pattern match support, otherwise it is currently type unstable

Even if the function/program only accepts Complex{Int8} and outputs Matrix{ComplexF32} the dev/prod divide is still very huge

Some of my experience on implementing that:

  1. I want to run the program on Raspberry Pi. While C/C++/Rust is as easy as making a Yocto/bitbake recipe for cross-compilation, I can’t do that with Julia
  2. You can’t vectorize complex multiplication unless if you convert from AoS to SoA representation manually
  3. How about CI/CD?
  4. append! in a loop is slower than preallocating the array, while it feels more natural
  5. Writing code in non mutating style feels more natural than mutating style

Then we have environment problem where the deployment target is a Raspberry Pi running Linux while most people only knows Windows PC and Matlab

Dev/prod requires different mindset and skillset, I don’t know how to solve this other than getting those very rare people that both understand algos and deployment

4 Likes

Can someone ELI5 the differences between development and production/deployment work? I don’t have any experience with this, so my naive assumption was that as long as the languages have working compilers for the different target OSs and the users provide proper input (e.g. not trying to throw Ints into a program designed only for Float32s), things should work smoothly. I don’t really understand how static analysis and typing would help when people already make the effort to make code type-stable for the intended input types.

Maybe if the application is relatively computationally intensive, I would also guess it is necessary to test and tell users the minimum specs for each target OS. But it sounds like things are much more complicated than that.

3 Likes

The biggest issue in “production” is that the input is not really guaranteed to be perfect, and the environment is not guaranteed to be perfect. For example maybe you grab a CSV file that describes the portfolio composition of a given exchange traded fund… this should be a set of rows with ticker symbol, and number of shares perhaps… But someone is going to hand you a file with an invalid ticker symbol or a number of shares that’s a float instead of an integer, or says NA sometimes for missing data and N/A other times… etc. And when there’s bad data like that, you want the thing to do something smart, maybe throw an error in a log, remove that line, continue with calculations as if the data were correct, but then put the output in a folder where it should be reviewed, and spit an SMS over to the data management team to get someone to look at the file and determine what should have been there…

Stuff like that. What’s not OK is for the system to silently give wrong (expensive!) answers or bork with no error code/explanation or die after 13 hours of running a simulation without outputting the results to a file because the filename was invalid, or the disk ran out of space… yes, someone should have made sure there was enough disk space, but you can’t just rerun the 13 hour simulation because you have to trade your stocks tomorrow… that kind of stuff.

8 Likes

@dlakelan Well said.

Tests make sure the right thing happens on some paths, but error-aware return types (my favorite is GitHub - JuliaPreludes/Try.jl: Zero-overhead and debuggable error handling) remind you to think about the other paths.

1 Like

The way I would approach this is to put the input in a data structure that is validated. Once the data structure is built, you promise that the downstream code will be able to process it. If some part of validation fails, you can either throw an error or try to correct the mismatch between user data and the specs.

In your example the data structure would be e.g. a table that has standardized format.

10 Likes

Makes sense, but I still don’t understand how C++/Rust-like static analysis or language design can handle those unexpected inputs, as OP is suggesting (maybe for different problems?). The program can’t know how the inputs are invalid (values, types) until runtime. It seems like Julia’s argument annotations and type conversions can restrict types the same way statically typed languages do, and input validation has to be done at runtime for any language.

2 Likes

A good example in the real world is Rust insisting you need to handle the possibility that fork can fail: fork in nix::unistd - Rust

Humans writing C++ often do not account for this possibility since it seems to never occur in many people’s experience: fork() can fail: this is important

More generally, languages that use exceptions make it very easy to forget to write error checking logic.

Your question is great to dig in on – it’s a really good exercise in language design to go through actual problems that individuals and companies have seen in practice and articulate which language features prevent which problems.

A good example where Julia struggles is the entire game of chasing type instabilities that decrease performance. Another one – the lack of interfaces that caused many of the bugs in Yuri’s post.

6 Likes

One of the reasons I still root for Julia is that these organizational problems truly seem more acute in other tools popular in data science: in these slow languages, everything is its own DSL + optimized functionality is a hammer you have to contort your logic to fit to avoid order(s) of magnitude slowdowns → code ends up more procedural and misaligned to business logic → when business types or stats types have duct taped something together just enough to get prod-critical and it’s time for an engineer to harden it, the code is farther removed from semantically meaningful capabilities, to the extent that engineers harden code without understanding the forest for the trees → brittle system, prematurely ossified, less likely to grow/evolve gracefully.

6 Likes

I definitely think the deployment aspect (in the sense of a robust and responsive commercial application as opposed as something like a long running simulation that we only need to wait for the response) is not a strong aspect of Julia right now, but I do think strict compile-time typing is not the only solution, especially in the domain Julia operates that tend to have more sophisticated validation (for example while I’d like that a neural net code is correct in terms of compilation, I’d rather a robust set of tools for validating it’s inputs, quick restarts with checkpoints in case of failure, real time tracking of performance and other metrics…).

I do work with two opposite languages in deployment scenarios, Elixir and Scala, and the former being even as dynamic as Julia (basically runtime dispatch based on arguments and basically only type hints through dialyzer, something that is a norm where I work) is not really a hindrance for deployment compared to the latter language as they have strong support for telemetry/metrics tools, tools to build and deploy, strong IDE support, has a documentation and common practices to avoid failure and exploit multithreading as a tool to create reliability (where independent processes can monitor each other, and take action on failure - which I think is a stronger safety than simply guarantee the correctness at compile time when most of the real problems come from the interaction with the external world beyond the validity of formally proven contracts through the type system).

I certainly would love static compilation/small binaries, contracts/interfaces (even as documentation that can be checked through static analysis tools and/or runtime verification), just as much as I would love robust multithreading frameworks like the Elixir OTP/Scala Akka (actors) that provides a layer of safety mechanisms and the basis for larger software in terms of software patterns, 1st class support for business tools (metrics/dashboard tools, streaming platforms, databases) and the like (in other words library support being able to cover the core language weaknesses and reducing the overhead when coding production level business software).

7 Likes

I agree (as usual) with your bigger points, but in this case I’d push back on the two specific issues you raised:

A good example where Julia struggles is the entire game of chasing type instabilities that decrease performance

That’s pretty good now: ProfileView → click on the red bar → type descend_clicked() at the REPL → see type-annotated source code (not @code_typed, but the actual source code) for the specifc argtypes that caused the type-instability. IMO it’s now something that should be pretty easy for intermediate and advanced developers alike.

Another one – the lack of interfaces that caused many of the bugs in Yuri’s post

What’s ironic is that AbstractArrays have long had one of the most clearly-specified interfaces we have. But I guess it’s focused more on people who want to create new AbstractArray subtypes than “how can I use these safely”? And imposing this at the level of the linter only arrived recently.

Overall, I disagree with the thesis of the OP post: Julia is not so different from other languages in terms of its safety, and sophisticated “deployment-oriented” developers will have no major beefs with Julia when the balance sheets elsewhere get improved. IMO where Julia most falls down is on “user experience”: unless you have demanding computational needs (which no beginners and only a subset of other users have), Julia’s value proposition compared to, say, Python, is debatable. Yes, you can get much better computational performance, but if your personal experience using it is slower, then why use Julia? And most people stick with the first language they learn until forced to learn something different, so for now that means most people will use Python.

But the good news is that this balance sheet is tilting rapidly: heck, with the fixes to TTFX alone the balance sheet between Julia and Python suddenly looks much more favorable to Julia than just a few months ago. Once a few other improvements get made (reduced package loading time, better error messages, small binaries would be really really nice, maybe a handful of others), then I think the balance might well favor Julia for a large fraction of users. It will take a while for people to realize it, but all we have to do is set the balance to favor Julia and the rest will eventually take care of itself. I think we’re finally getting within sight of being able to do that.

59 Likes

There has been some pushback on that claim (@Sukera )

https://seelengrab.github.io/articles/About%20some%20responses%20to%20that%20"Correctness%20in%20Julia"%20post/

4 Likes

We deploy julia in electrochemical plants to do realtime monitoring, and our experience has been very good. We use PackageCompiler to create executables, strip out the source code/docstrings, and just run it either inside a Docker or straight-up on the client machine. Yes, C++/Rust can do it better, and can generate much smaller executables. It’s a core competency for them. But in the context of “I want to run medium-large scientific software to do processing/modeling”, Julia is already very adequate in my book.

27 Likes

That’s great to hear – I’ve seen lots of good progress there and am glad to know you feel it’s mostly fixed – I haven’t dug into it recently enough to know for myself. The flow you describe does seem like it still runs one risk, though – is it possible to download Julia and use it without realizing that the best practice workflow requires using the tools you’re describing? If so, I think you’ll still see people getting stuck on this even if the community considers it a solved problem and even if the current path is the optimal strategy for Julia to pursue. Obviously production languages like Java suffer this problem as well.

Yes, I think the distinction between “can I implement this interface for my new type” and “if I accept arguments of this interface type, can I be sure my code will work as I intend?” is core here. I think the current state for Julia is a bit like the arguments you see between Rust and C++ folks. C++ folks say “X is possible in C++ and is the best practice” and the Rust folks respond by noting that “not X is also possible in C++, but not in Rust”.

I’m not totally sure about this – my impression is that Julia is substantially less able to make safety guarantees about parallel programming than Rust for example. Am I missing things that prevent data races?

I 100% agree here – it’s why I still think Julia is the best bet for a scientific programming language out there today. Indeed, one of the reasons I still follow Julia is that I find it inspiring the community is still trying to improve rather than saying “we’re already successful enough and not going to change much anymore”.

10 Likes

Ah, my comment was based on other languages that have been widely used for deployment, like javascript, C, and C++, but did not include Rust (since I know little about it). I don’t think Julia is at a disadvantage compared to these older languages (probably the contrary, in fact), especially given that we have JET. But I’m not able to comment about Rust generally or its parallel guarantees specifically. It may have some special sauce that catches common classes of bugs, which sounds awesome, but regardless of compiler checking there is no substitute for runtime testing.

3 Likes

Yes, yes, and yes! (never enough :smiley:)

I tried to spark my colleagues’ interest in Julia. But binary size (and memory usage) was one of the main drawbacks. The other issue is the comparatively not-so-much developed ecosystem in the microservice world. But for that last part, it’s up to the community (which does its best) and some time.

In the end, I’m still the only one using it, but just for demanding topics with loads of mathematics.

Another nice improvement (I think it was an idea suggested by David Anthoff) would be to enable a “strict typing mode” in VS Code linter, so that type error would be caught at coding time (and not compile, or worst, run time).

9 Likes

We would probably benefit from smart pointers and definitely need more ways of documenting unsafe and private functions in Julia. However, I think this idea that no unsafe methods should exist is poorly conceived.

I really like having an unsafe version and checked version of many functions so that if I know what’s going on and can guarantee safety conditions, I use the unsafe version. But this requires clearly documenting what makes something unsafe and how to check it. We have decent documentation tools but we could probably benefit from some additional ways of annotating this sort of stuff.

1 Like