Pkg.jl telemetry should be opt-in

It is one thing to do what lawyers say is acceptable within the law and another thing to do what is right. It may not be illegal, but it doesn’t feel right. I understand the desire and utility and I believe there is a right way to do this, but this way doesn’t seem right. Maybe you can tie telemetry data to having a JuliaHub(.com) account or something?


How about every time someone presses the ] key to get into the package manager, the REPL prints a single line above the new prompt with whether or not it’s enabled or disabled…

Pkg telemetry enabled
(@1.5.1) pkg>

and add a simple command there telemetry disable or telemetry enable to switch your status.


Requiring a JuliaHub (or any other) account would

  1. link this data to a person,
  2. reach a fraction of people and thus pretty much defeat the purpose.

I am not sure I understand what “doesn’t seem right” to you about the proposed opt-out approach with a warning, but if you have concerns about privacy, then they also apply for your proposal.

I also feel icky about various companies collecting data about the way I use my computer, but in this particular case I feel that the data is as anonymous as it gets, and the benefits (quantifiable user base \Rightarrow more funding and support for Julia) overwhelm the mostly theoretical concerns.

The change of course should be announced via the usual channels (this forum, the official Julia blog, …) and feature prominently in the release notes so that those who want to opt out can do it.


I think you are painting this in a light which is unfair, as if properly checking law compliance is something that is done to find loopholes, and generally be a bit weaselly. GDPR is strict, complying with it is nothing to sneeze at.


I never said or implied that checking GDPR compliance is done to find loopholes (I am the former Head of Risk Management for one of the largest insurers in Asia so I know a thing or two about compliance). Kudos for checking with lawyers. I’m saying the way this seems to be getting done doesn’t feel right and there is probably a better way with less reputation risk. Humans screw up. I screw up. The telemetry thing is going to screw up somehow. When it does, there can potentially be some real repurcussions. I am not sure the benefits outweigh the risks with the information I have so far (which is limited to this forum post - the first I’m hearing about it - which is another sign it isn’t being done right since v1.5 rc is already in the wild).

If I bother to comment here, it is because I care. I don’t want to see Julia on the front page of a newspaper with some glaring negative headline about data privacy issues.


That is how it read to me. I’m glad to hear you did not intend it, but I think it did sound as if you were implying that they are deliberately balancing right on the edge of breaking the law.

Edit: Anyway, some community pushback is good, it helps the process stay healthy, so I don’t think that’s a bad thing. I simply reacted to the phrasing.

1 Like

For me, after @StefanKarpinski 's explanation about how lawyers see the IP issue, it seems reasonable to me. There could be a remark regarding the IP adress in the but all in all, it doesn’t look wrong to me and I am pretty sure that it helps a lot to improve Julia.

What exactly do you think is wrong? That it isn’t opt-in, just opt-out? As there is no personal data I would say opt-in is really not necessary. Looking at the data it is hard to imagine how this could be abused. And it seems quite minimal, which should be like that, no unnecessary data as far as I can tell.


Thinking about this, I can imagine the following “nudge opt-in” mechanism:

  1. users have to opt-in to the telemetry explicitly

  2. until they do this, they get a friendly message at, say, each Pkg.update():

    pkg> update
    [packages get updated]
    Please consider participating in the anonymous
    package telemetry survey with
        pkg> telemetry enable
    To disable this message, use
        pkg> telemetry disable
    For more information, see
        pkg> telemetry info
  3. after disabling it, the message is not shown again until the next major release.


I considered that but having a nag screen could be quite annoying and there are potential issues with incorrectly prompting the user in a non-interactive situation, which would effectively hang the Julia process. It does not seem worth making Julia potentially less reliable and annoying people. Furthermore, telemetry data can also be useful for helping to figure out what’s going on with CI and other automated systems (both for abuse prevention and to understand usage); if this required a manual opt-in during an interactive session, we wouldn’t get telemetry from any automated systems.


I understand your point about CI and non-interactive use, but given the reactions above and that the primary goal is to collect information about actual user installations, perhaps an “nudge opt-in” framework could just disable telemetry (& nagging) altogether when !Base.isinteractive(), since interactive use is bound to happen at some point for users anyway.

This is just a suggestion for a compromise, I am actually fine with telemetry as implemented.


While that’s one of the top priorities, it’s not the only reason. Serving requests to CI processes is expensive—network bandwidth is the primary cost of running a pkg server, not compute. Telemetry data from CI systems helps understand what people are doing in those automated processes and mitigate those expenses. For example, by deploying package servers that are colocated with CI services (so bandwidth is cheaper or even free). That’s why we check all those CI indicator variables: to try to help understand what services are making requests. If we see a huge deluge of new traffic (this is realistic and does happen already for services we host) and all we have is IP addresses, it’s much harder to figure out what’s going on than if we also have CI indicator variables, Julia version numbers, and client UUIDs, which allow us to figure out which requests are coming from the same instance and which are coming from different ones. Debugging these kinds of situations is hard and doing it completely blind is much harder, so having more context when this happens really helps.

Knowing which CI services people are using is also helpful for prioritizing quality of support for those CI services. Right now we collectively are good at supporting Travis and AppVeyor because that’s what Julia itself uses, but if we find out from CI variables that a ton of people are using Azure Pipelines, for example, then it may be worth the time and effort to make sure that works really flawlessly in the Julia ecosystem. Without those telemtry headers, we can’t know to spend time and energy on that.


Thanks, this is a very useful explanation and clarifies a lot of the motivation. It would be great to include these additional reasons in the announcement of the new telemetry feature.


The second image in the Telemetry article on Wikipedia shows a crocodile with a GPS and radio on its head. This radio is collecting valuable information for scientists, yet I guess this device was placed there without the crocodile’s consent. The Pkg telemetry will likewise collect undeniably valuable information that will be used in fundraising, at present without the users’ explicit consent. I expect this sort of tracking and monetization from Facebook and Google, yet find it surprising and distasteful that Pkg.jl telemetry is monetizing Julia users.

Whether or not we keep the telemetry opt-out, I’d prefer that the Pkg telemetry page on mention fundraising; i.e. not hide the fact that the telemetry is monetizing Julia users.


Saying “Julia has approximately $n users according to telemetry, therefore it is a viable platform” is very different than “we have collected detailed identifiable data on $n users that we will use to target advertisements if you pay us.”


Just so I understand, your argument is:

  • With telemetry it is possible to count the number of Julia users.
  • The count of Julia users is information that could possibly be useful when applying for grants and fundraising.
  • Therefore, Pkg.jl is “monetizing Julia users”.

Is that a roughly accurate implication chain?


Yes; see data monetization on Wikipedia. I’d prefer that we not hide the fact that this information will be used for fundraising


Don’t you think that this is already clearly explained in the following statement?


Yes, I apologize for not reading more closely.

1 Like

I don’t feel monetized! I feel grateful that Julia exists and there are people who work hard to make Julia happen. Now, as a user, complaining to be monetized, really feels wrong to me (If I would be complaining).

I know, this is not the best argument, as you may also feel grateful for Googles great search engine, so I am happy to give all my data to the ads machine.

Whats real is that it is always a balance which needs evaluated. And just argueing that we are monetized and this is bad per se is not balanced, it is just following some kind of zeitgeist, where raising data has been bad before, it must be bad always.


I am hugely conservative in my data footprint. I regularly purge all cookies, use privacy browser plugins, and have even gone so far as to block (and manually whitelist) javascript at times. I use DDG, have bailed on much social media, and such. I’ve gotten into many arguments with family members saying they “have nothing to hide” because they definitely do. This discussion, though, is puzzling to me.

What is the threat model? The telemetry is extremely conservative, really only adding three things above and beyond what is required for any package server in any language. The biggest one is that persistent client ID that is unique to Julia. Unlike an IP address or a browser fingerprint, you cannot connect it with any other service or action.

Unlike a typical TOS, the data page is extraordinarily transparent, understandable, and legible.