Release strategy for juliac?

In another thread @mbauman said:

which, of course, triggered my inner excitement and I had to open this discussion.

At the time, this, the juliac, is ready, the core Julia people together with us, the community, should, in my opinion, consider some points:

  • ready = should be NOT just ready for the enthusiasts but especially for new to Julia people, because those will be attracted most. The current Julia users are mostly from data science who are already quite happy with Julia as it is. But with juliac a whole new world is opened, which will attract many developers which are just new to Julia. They should experience a satisfying first Julia experience with juliac.

  • releasing juliac should really be accompanied by some kind of larger marketing campaign. Orchestrated press articles, good professional Blogs at higher level sites and similar actions.

  • Looking at the statistics at Some Julia growth/usage stats , Julia is doing well it seems (it’s difficult getting good numbers) . Setting up something better to measure Julias overall usage in the world would be a good thing BEFORE releasing juliac. Knowing about the impact of juliac on this measurement would be of quite some value.

These are just my immediate thoughts when I read that a juliac is in good progress.

What do you think? What’s the proper release strategy for juliac? What do you think about a beta period, where some of us heavily test out the new juliac ? Or perhaps I am just overreacting and juliac should just be silently released into the world of Julia, for a more slowly organic growing? Of course a big marketing campaign can throw back and do harm instead of being of value.

11 Likes

Why would they do anything other than a standard release cycle like all software developers do?

3 Likes

Can’t wait to see the effects on Debian’s BenchmarkGames! :grinning:

2 Likes

This “something” would have to be some kind of instrumentation in the Julia client. Unless you’ve got something clever in mind that no one else has come up with before. Two issues with that:

  1. Last time we tried to do that it was a big fuss, and I’m very reluctant to propose anything like that again. However HyperLogLog hashing might be ok with people since it has pretty nice anonymity properties.
  2. If we do that it wouldn’t be in clients until the next Julia release, which would be 1.12, and that’s the same release juliac stuff is likely to be ready for.

The only option to get a better way of estimating user base earlier would be to put it into 1.11 or 1.10 before their next release, which is possible but seems a bit iffy; or delaying the release of juliac, which seems counterproductive. We should also have a discussion about adding HyperLogLog hashing on the client in the first place here on discourse and see if people have objections.

12 Likes

How would that work? Count downloads? Count computers that have Julia installed? What if there are different Julia versions on one computer? Which data would be transferred to the server?

I am aware of that, I was part of the discussion, and I know how difficult this is.

No, I don’t have new ideas in this direction. I just want to see the impact of juliac somehow, I expect it to be huge, but I am often wrong with my prognoses. In general I think it doesn’t need perfect count numbers in an absolute meaning. It would be enough to setup a measurement, perhaps on base of your stats thread above, and just let it constant (difficult enough) for a few years. This would give enough insights in the ups and downs.

1 Like

The code in Pkg that talks to the pkg server would generate a HyperLogLog hash value, which would be saved in in a ~/.julia/servers/$server/info.toml file and sent along with each Pkg client request as a header value. Whereas UUIDs, which are 128 bits, can uniquely identify and thus be used to track individual users, HLL hashes would only be a few bytes and thus cannot be used to track individual users. Multiple different Julia versions would use the same hash value, so it would allow estimating installs across different versions, projects, etc.

The anatomy of an HLL hash is that there’s a uniformly randomly chosen bucket part and an exponentially sampled part. The hash value would be generated something like this:

bucket = 0x1fff & rand(UInt16)             # 13 bits
sample = leading_zeros(rand(UInt32) | 0x1) #  5 bits
hllval = bucket | UInt32(sample << 13)     # 18 bits

I would probably Base64 encode this hash as three ASCII bytes, something like this:

base64 = ['A':'Z'; 'a':'z'; '0':'9'; '-'; '_']
hllstr = base64[(hllval >> 12) & 0x3f] *
         base64[(hllval >> 6) & 0x3f] *
         base64[hllval & 0x3f]

Then the header would look something like this:

Julia-Pkg-HyperLogLogHash: Fp5

While 18 bits is enough for 262,144 unique values, these would be randomly generated on clients with no coordination, so the birthday paradox implies that you’d start getting collisions after only on the order of 2^9 = 512 clients—if these were uniformly generated. They aren’t uniformly generated, however, so you’ll actually get collisions way sooner than that. Which means these hashes are pretty useless for tracking people—after a few dozen people, there will be lots of duplicate hashes.

By the miracle of HyperLogLog estimation, however, these tiny, non-unique hashes would let us estimate how many unique clients have made requests—up to around a billion clients with less than 1% error, which is more than good enough for our purposes.

22 Likes

But if I delete the .julia folder and re-install this would count as a new user?

@mbauman, why do you say that? And what would it involve? I mean, I see no movement on the PR, plus it’s just a “placeholder”:

Adds two new cli drivers, juliax and juliac. At the moment, juliac is a placeholder and just errors

That’s just supposed to be a CLI “driver” yes, so I’m not sure where to look for actual work on the actual compiler improivments, that is at JuliaLang (I know of I think all the outside efforts). Plus, I thought that in effect juliac would just be a standard way, just invoking what’s more or less possible already with PackageCompiler.jl (PC), i.e. not a huge improvement.

Julia precompiles packages already, so it alone can do that, and I DO want arbitrary scripts to be compilable also, it doesn’t seem like a huge leap to do that (and including its dependencies), since PC does that, into one file. But it doesn’t seem it would be better or worse than with PC, it might even be done by it, by juliac downloading it for you and, invoking PC. For now people need to know of it, I suppose it’s documented in Julia’s official docs, or could be with a link.

There’s also AppBundler.jl in case people only want packages easily distributable apps in one file. I belive PG only gives you precompiled in one directory, not file, and you need to distribute it or in a manual step make an installer (e.g. for Windows, it’s documented how in a YouTube video at least). AppBundler does NOT compile (it’s not always needed), and it could be used with PC, at least theoretically.

1 Like

Yes. Or if you just delete the ~/.julia/server/$server/info.toml file.

So I would count as 10 new users per year… I also have multiple computers…

2 Likes

Yes. If you have some other suggestion for counting, please let’s hear it. Do you want to scan people’s retinas when they start up Julia or something?

15 Likes

Only if it that comes with astronomical VC funding and putting Julia in some sort of blockchain.

1 Like

I mean, I guess most of the Julia users do not re-install Julia as often as I do…

But it might be good to have a small sample of users that report how often they installed Julia in the last year… Could be part of the yearly survey…

1 Like

The “driver” is the least interesting part of the work (IMHO). There’s lots of compiler work necessary to actually output an exe that’s reasonably small and fast — and do the compilation reliably and in a reasonably short amount of time, too.

7 Likes

At some point you just have to stop and consider whatever proxy number you have to be good enough. Looking at a HyperLogLog estimate of monthly unique installs seems fine to me. Is this exactly the same as the number of people using Julia? Obviously not, but we’re never going to get that number. What even counts as a Julia user? Do we want to count every person who has ever typed an expression into a Julia REPL? Not only is it unclear how to define a “Julia user” in the first place, but the measures it would take to count that accurately are simply too absurd to go through with. And what does it matter? What are you even going to do with that number?

9 Likes

Some people earn the living with Julia, or some people want to do that.

This gave me a laugh… but the counting measure is something important in general and should be discussed separately in detail.

Back to the

Why would they do anything other than a standard release

A juliac opens up a complete new world for Julia as a proper language. Currently it is a bit constrained on data science (and similar) because of its scripting nature.

For many commercial software products it is still important to be closed source, well, not for the products but for the sellers. Small executable also fit better into current usual release distributions. So do binary libraries. Another world is cross compilation and embedded systems. Just to name a few examples.

With a juliac Julia is suddenly a viable choice as a programming language for so much more fields.

In my opinion this is even more important than a somewhat Julia 2.0 in any future. But a Julia 2.0 would generate quite some echo in the media.

Of course, this is my view on a juliac, and my point of view is, that juliac should be, first, well tested and extraordinary good, because and second, should be a major public event reflecting its impact it could have.

7 Likes

To add to, what @StefanKarpinski already said - these stats are, in my opinion at least, most interesting as relative metrics, i.e. for comparison to some earlier point in time.

Under the assumption that the relation between our surrogate metric and the value we’re interested in is fairly static, this allows us to track the effect that various events might have on the rate of julia adoption, like the introduction of an ahead-of-time compiler.

2 Likes

I hope you can see how you’re being somewhat self-contradictory here. You simultaneously want juliac to be well-tested and extraordinarily good, but also not to be released before its ready and released with a splash, but you’re are also excited to beta-test it, but also (I imagine) wanting it to be open source… all while also not being involved in its development. I’m similarly excited and similarly on on the sidelines. :slight_smile:

I do see your point that the name juliac already has a significant cachet attached to it… and perhaps the name itself could be held back a bit?

7 Likes

Definitely not terrible, particularly if the logs of the IP addresses aren’t also being stored.

I also think some kind of opt-in reporting of startups could be useful. For example a package PopularitySurvey.jl could be created. This contains a single function PopularitySurvey.reportpop(fraction=0.001) or something. This generates a random cryptographic number and if it’s less than fraction (a number between 0.0 and 1.0) it spawns a thread that makes an effort to PUT the current Unix time and its fraction value to http://julialang.org/popularity or some such thing with some timeout. Perhaps make the default be 0.001 or something and the max fraction be 0.01 so we’re not hammering the server, and of course updated package versions could change the default. With a default of 0.001 and 300M people starting Julia on average 10 times a day, it’d be 35 hits a second on the server.

Then people opt-in by installing PopularitySurvey.jl and putting

using PopularitySurvey
reportpop()

in their startup.jl

While this doesn’t count separate users it does allow to estimate frequency of startups per second globally. At least among people choosing to participate.

growth through time would be due to two factors

  1. Adoption of the voluntary reporting
  2. More frequent usage of the language.