Experience report after finishing a (reasonably substantial) Julia project in 2024

If there’s a “Julia by Example” book it would answer a lot of my questions. Reference: Rust by Example, Rust Cookbook

SciML is the closest thing I think in Julia.

3 Likes

I’m happy to say there are now some good julia books, I remember when there were not.

I like:

  • Hands-On Design Patterns and Best Practices with Julia, by Tom Kwong
  • Julia for Data Analysis, by Bogumil Kaminski

BTW I agree 100% that the ‘Time to First Package’ is sometimes arduous. I used Julia a long time before getting serious at work with it, and it was a struggle. But it took me as long to develop python packages, and was only successful after discovering Poetry.

Is it really possible to develop packages quickly in a new language if the features and documentation meet the needs you mentioned above?

2 Likes

Books are gateway unless they are free. They are nice to have, but most people want to at least try the language a bit before buying any material.

For packages, depends on the language. In Rust, for instance, the first thing you do is cargo new, so you always develop packages. For julia it’s different, because of its scripting, REPL-oriented workflow. I think code organization should come right after the absolute basics are explained.

Python’s package management is terrible, and I said it multiple times in this thread. However, that doesn’t bite you until you are halfway through the project and too deep to return, while Julia bites you immediately. After which you are going through a zombie meat grinder.

3 Likes

There is an excellent workflow blog (I can’t recall the link atm). So that has changed since you were getting started… Edit: https://modernjuliaworkflows.github.io/

3 Likes

I can relate with most points, but are those two even true?

Due to how include works, the scope of an unclosed end could leak beyond file boundary, and cause problem somewhere completely unexpected.

My experience is that included file must contain a well-formed expression, so a missing end should be reported at least in the same file (although I agree that usually the position of an error is reported at the top of the file, and finding the precise location within it is not easy)

Take the following example:

# a.jl

function curvature(...)

end

# b.jl

function some_other_function()

...

curvature = ...

end

Well, apparently after this executes, curvature’s definition is overwritten, and every other code is broken.

But local variables inside functions (or other nested scopes) cannot overwrite top-level definitions, unless the name is explicitly marked as global?

5 Likes

You are correct. That’s where I started losing interest in this because there are quite a few factual errors here.

2 Likes

Just a little tidbit here, you can also do

] activate @data-work

to activate global environments, which I find a nice little convenience.

4 Likes

What I wrote about these point were incorrect, and someone else pointed out before.

2 Likes

Sorry, I can’t edit my original post so there’s nothing I can do to correct any mistakes.

You should be able to edit it now.

Whoaaaaa

Yes I didnt catch this on first read. This is not correct at all.

I did point that out already @merlin

And OP did respond

Yeah, with 73 posts this conversation can easily start going in circles. Thanks so much for the report, @rongcuid! I’ll schedule this thread to close shortly — any other concrete actions or suggestions in response to this can continue in separate topics.

18 Likes

I’ve made the edits. I think those are the only two things I got wrong, unless I missed something else.

14 Likes

My bad, sorry :pray:
Somehow I did not notice @abraemer’s post and your reply.

Regarding the second issue - maybe you’ve shadowed some symbol within your package, e.g. by defining a global variable length or something, and then you have to use Base.length everywhere?

1 Like

Bézier is known non-trivial to implement (thus maybe should have used some of the libraries, that seem good and no need to reimplement?), but I don’t think it explains 760 MB if that was the problem. The only thing I can think of is memory allocations, and you could have optimized away, and then they add up, and the GC not keeping up? It seems very large, a “deceleration structure”… :slight_smile: Until GC issue fixed. How do you know for sure its size, not just max amount of garbage?

Game AI is different from some other AI that Julia has libraries for. I at least know of none for Julia for games (except for AlphaZero…), but I might be wrong and some existing, e.g. in some of the 2D or 3D Julia game engines that DO exist?

Unlike Julia Discourse which is official, I don’t think Discord is, I didn’t even know of it (for Julia), so completely unsurprising…

I highlight in bold exception in the Julia official styleguide (there are also others superset of this one, I think all of them):

Use naming conventions consistent with Julia base/

  • modules and type names use capitalization and camel case: module SparseArrays, struct UnitRange.
  • functions are lowercase (maximum, convert) and, when readable, with multiple words squashed together (isequal, haskey). When necessary, use underscores as word separators. Underscores are also used to indicate a combination of concepts ([remotecall_fetch](Distributed Computing · The Julia Language{Any, Integer, Vararg{Any}}) as a more efficient implementation of fetch(remotecall(...))) or as modifiers.

Is that a bit unfair? I think Distributions is not a zombie package, and basic statistics good (and Julia now claimed the best language to implement statistical packages, for also Python, R etc.), I may be ignorant of some of the areas, e.g. solvers very good? And many more good packages than you list, e.g. a lot in SciML. Maybe you don’t know where to look, and that is a problem finding best packages? CSV.jl?! I thought the I/O landscape was actually great! For Julia you should often look further than the standard library, e.g. DelimitedFiles.

They don’t need to be, only documented to exists (you can find at Juliahub); and ideally those would be documented in the standard library, yes, but I’m not sure I find them, or just false positives when looking up JSON[3] and CSV,…

Arrow.jl was at one point the best (better than for Python I recall, can’t find the comparison table) implementation of any language (I thought): I haven’t kept track for Post-1.0.0 Format Versions

Julia’s implementation got to be official, not sure what happened since (I’m just curious if you’re looking at something outdated, but I can’t find it under JuliaIO since it moved to Apache; otherwise I think JuliaIO is semi-official, as official as can be?):

FYI: PythonCall.jl (rather than PyCall), and you can use all Python libraries, e.g. for Arrow, I believe, in case better.

2 Likes

It’s not a problem, I just described how I implemented it. It’s a 1600m race track cached at a precision of 0.1mm, 760MB is about right, as there should be about 15 million data points. I obtained the number by measuring the object size.

I benchmarked the Bezier representation and the accelerated representation. Accelerated was about 20 times faster. In fact, since the whole acceleration structure is cached, there is no GC involved: I “bake” the track once before game starts, then frees it after game ends.

Oh, OK… I didn’t see that sentence. I mean, stdlib has like searchindexfirst or something… which I totally used in my code but forgot.

Distributions and basic stats is not zombies, so isn’t symbolics. When I wrote the post, a large portion was written by memory, as I didn’t really take notes when I wrote the program. Again, Symbolics.jl never solved anything useful for me, so a more accurate way to say it is that “it’s not as mature as SymPy”.

Oh no no no, I strongly disagree. Firstly, Base should have a CSV and JSON parser. I have attempted DelimitedFiles, but it is nowhere powerful enough to deal with real world CSV files. IO libs in Base don’t have to be blazing fast, they just need to be non-intrusive and flexible. If I want a faster library, I can always grab another package.

And again, Arrow and Parquet, which I use very often, is not well supported. As I said in the other thread split from this topic, I have used both in my other projects, and neither format worked for me. I had problem with reading, memory mapping, streaming, using TableOperations, writing, basically in every aspect. Not to mention that the documentation was pretty sparse, and that Arrow’s documentation isn’t very good on itself. I would have used something more common like JSON if not because I am dealing with hundreds of GB of compressed data. Sure, it’s last year so things have changed, but that was my experience, so I included it here.

Nope, I don’t use Julia just to call python. My research environment has enough languages. I am not going to have 5 different languages calling each other in the same project anymore.

2 Likes

If you want to slow development of Julia IO packages to a crawl, this is how to do it. Instead of allowing each package to iterate based on their own schedules, make them iterate with the whole language itself.

There may be other ways to solve this such as creating preset Julia environments for people to instantiate.

Seriously though, does anyone actually use the Python csv standard library anymore? I’ve always used pandas and maybe polars. I had to look up if a Python csv standard library actually existed.

9 Likes

Python’s csv and json modules are widely used. I wouldn’t want to install and import pandas just to load a file.

I agree with your conclusion though: format/serialization functionality shouldn’t be in Base.