Pkg.jl telemetry should be opt-in

If you’ve already done some package operation in the Julia process when there was a telemetry file and then you delete the file and do another package operation, you will not get the notice (since there was a telemetry file, implying that you’ve gotten the notice already). There’s logic to ensure that the legal notice is printed at most once per process, because otherwise the Pkg.jl CI logs were getting spammed with lots of legal notices. If you want to trigger showing the legal notice, you can:

  1. Delete the telemetry file.
  2. Start a fresh Julia process.
  3. Do a package operation that connects to the server.
5 Likes

Yes I restarted Julia. However after freshly downloading 1.5.0-rc1.0, I can now trigger it, so I guess all is well, sorry about the noise!

3 Likes

It is one thing to do what lawyers say is acceptable within the law and another thing to do what is right. It may not be illegal, but it doesn’t feel right. I understand the desire and utility and I believe there is a right way to do this, but this way doesn’t seem right. Maybe you can tie telemetry data to having a JuliaHub(.com) account or something?

9 Likes

How about every time someone presses the ] key to get into the package manager, the REPL prints a single line above the new prompt with whether or not it’s enabled or disabled…

Pkg telemetry enabled
(@1.5.1) pkg>

and add a simple command there telemetry disable or telemetry enable to switch your status.

8 Likes

Requiring a JuliaHub (or any other) account would

  1. link this data to a person,
  2. reach a fraction of people and thus pretty much defeat the purpose.

I am not sure I understand what “doesn’t seem right” to you about the proposed opt-out approach with a warning, but if you have concerns about privacy, then they also apply for your proposal.

I also feel icky about various companies collecting data about the way I use my computer, but in this particular case I feel that the data is as anonymous as it gets, and the benefits (quantifiable user base \Rightarrow more funding and support for Julia) overwhelm the mostly theoretical concerns.

The change of course should be announced via the usual channels (this forum, the official Julia blog, …) and feature prominently in the release notes so that those who want to opt out can do it.

18 Likes

I think you are painting this in a light which is unfair, as if properly checking law compliance is something that is done to find loopholes, and generally be a bit weaselly. GDPR is strict, complying with it is nothing to sneeze at.

7 Likes

I never said or implied that checking GDPR compliance is done to find loopholes (I am the former Head of Risk Management for one of the largest insurers in Asia so I know a thing or two about compliance). Kudos for checking with lawyers. I’m saying the way this seems to be getting done doesn’t feel right and there is probably a better way with less reputation risk. Humans screw up. I screw up. The telemetry thing is going to screw up somehow. When it does, there can potentially be some real repurcussions. I am not sure the benefits outweigh the risks with the information I have so far (which is limited to this forum post - the first I’m hearing about it - which is another sign it isn’t being done right since v1.5 rc is already in the wild).

If I bother to comment here, it is because I care. I don’t want to see Julia on the front page of a newspaper with some glaring negative headline about data privacy issues.

6 Likes

That is how it read to me. I’m glad to hear you did not intend it, but I think it did sound as if you were implying that they are deliberately balancing right on the edge of breaking the law.

Edit: Anyway, some community pushback is good, it helps the process stay healthy, so I don’t think that’s a bad thing. I simply reacted to the phrasing.

1 Like

For me, after @StefanKarpinski 's explanation about how lawyers see the IP issue, it seems reasonable to me. There could be a remark regarding the IP adress in the https://julialang.org/legal/data/ but all in all, it doesn’t look wrong to me and I am pretty sure that it helps a lot to improve Julia.

What exactly do you think is wrong? That it isn’t opt-in, just opt-out? As there is no personal data I would say opt-in is really not necessary. Looking at the data it is hard to imagine how this could be abused. And it seems quite minimal, which should be like that, no unnecessary data as far as I can tell.

6 Likes

Thinking about this, I can imagine the following “nudge opt-in” mechanism:

  1. users have to opt-in to the telemetry explicitly

  2. until they do this, they get a friendly message at, say, each Pkg.update():

    pkg> update
    [packages get updated]
    Please consider participating in the anonymous
    package telemetry survey with
    
        pkg> telemetry enable
    
    To disable this message, use
    
        pkg> telemetry disable
    
    For more information, see
    
        pkg> telemetry info
    
    
  3. after disabling it, the message is not shown again until the next major release.

22 Likes

I considered that but having a nag screen could be quite annoying and there are potential issues with incorrectly prompting the user in a non-interactive situation, which would effectively hang the Julia process. It does not seem worth making Julia potentially less reliable and annoying people. Furthermore, telemetry data can also be useful for helping to figure out what’s going on with CI and other automated systems (both for abuse prevention and to understand usage); if this required a manual opt-in during an interactive session, we wouldn’t get telemetry from any automated systems.

17 Likes

I understand your point about CI and non-interactive use, but given the reactions above and that the primary goal is to collect information about actual user installations, perhaps an “nudge opt-in” framework could just disable telemetry (& nagging) altogether when !Base.isinteractive(), since interactive use is bound to happen at some point for users anyway.

This is just a suggestion for a compromise, I am actually fine with telemetry as implemented.

12 Likes

While that’s one of the top priorities, it’s not the only reason. Serving requests to CI processes is expensive—network bandwidth is the primary cost of running a pkg server, not compute. Telemetry data from CI systems helps understand what people are doing in those automated processes and mitigate those expenses. For example, by deploying package servers that are colocated with CI services (so bandwidth is cheaper or even free). That’s why we check all those CI indicator variables: to try to help understand what services are making requests. If we see a huge deluge of new traffic (this is realistic and does happen already for services we host) and all we have is IP addresses, it’s much harder to figure out what’s going on than if we also have CI indicator variables, Julia version numbers, and client UUIDs, which allow us to figure out which requests are coming from the same instance and which are coming from different ones. Debugging these kinds of situations is hard and doing it completely blind is much harder, so having more context when this happens really helps.

Knowing which CI services people are using is also helpful for prioritizing quality of support for those CI services. Right now we collectively are good at supporting Travis and AppVeyor because that’s what Julia itself uses, but if we find out from CI variables that a ton of people are using Azure Pipelines, for example, then it may be worth the time and effort to make sure that works really flawlessly in the Julia ecosystem. Without those telemtry headers, we can’t know to spend time and energy on that.

24 Likes

Thanks, this is a very useful explanation and clarifies a lot of the motivation. It would be great to include these additional reasons in the announcement of the new telemetry feature.

6 Likes

The second image in the Telemetry article on Wikipedia shows a crocodile with a GPS and radio on its head. This radio is collecting valuable information for scientists, yet I guess this device was placed there without the crocodile’s consent. The Pkg telemetry will likewise collect undeniably valuable information that will be used in fundraising, at present without the users’ explicit consent. I expect this sort of tracking and monetization from Facebook and Google, yet find it surprising and distasteful that Pkg.jl telemetry is monetizing Julia users.

Whether or not we keep the telemetry opt-out, I’d prefer that the Pkg telemetry page on Julialang.org mention fundraising; i.e. not hide the fact that the telemetry is monetizing Julia users.

4 Likes

Saying “Julia has approximately $n users according to telemetry, therefore it is a viable platform” is very different than “we have collected detailed identifiable data on $n users that we will use to target advertisements if you pay us.”

18 Likes

Just so I understand, your argument is:

  • With telemetry it is possible to count the number of Julia users.
  • The count of Julia users is information that could possibly be useful when applying for grants and fundraising.
  • Therefore, Pkg.jl is “monetizing Julia users”.

Is that a roughly accurate implication chain?

11 Likes

Yes; see data monetization on Wikipedia. I’d prefer that we not hide the fact that this information will be used for fundraising

2 Likes

Don’t you think that this is already clearly explained in the following statement?

5 Likes

Yes, I apologize for not reading more closely.

1 Like